Path: utzoo!attcan!uunet!wuarchive!gem.mps.ohio-state.edu!apple!rutgers!sunybcs!oswego!news
From: dl@g.g.oswego.edu (Doug Lea)
Newsgroups: comp.sw.components
Subject: Re: Garbage Collection & ADTs
Message-ID:
Date: 27 Sep 89 20:21:03 GMT
References: <900@scaup.cl.cam.ac.uk> <6530@hubcap.clemson.edu>
<909@scaup.cl.cam.ac.uk> <62342@tut.cis.ohio-state.edu>
<599@hsi86.hsi.UUCP> <16270@brunix.UUCP>
Sender: news@oswego.Oswego.EDU (Network News)
Reply-To: dl@oswego.edu
Organization: SUNY Oswego
Lines: 285
In-reply-to: twl@brunix's message of 26 Sep 89 14:32:39 GMT
First a warning,
I do quite a lot of C++ programming
and a perhaps slightly contentious definition,
I take OO programming to be that which is in part or whole concerned
- -
| construction |
with the | mutation | of objects
| inspection |
| destruction |
- -
(And, of course, OO design to be concerned with the specification,
definition, interrelationships, etc., of such objects...)
Only half-facetiously, one could make the vicious claim that GC
removes programmer control over 1/4 of the OO programming paradigm: In
C++, at least, the notion of destruction of an object may well require
some structured, active, resource deallocation, and need not just
consist of passive reclamation of memory cells. The classic example
where such control is useful is a Window object that needs to erase
itself from view upon destruction. In a GC-based system, the
mechanisms for implemententing such actions seem less than natural --
either the programmer would need to `logically' destroy the window,
and then let the collector sometime later reclaim space, or perhaps
the collector could be designed to do all kinds of storage reclamation
associated with the object at some indeterminate time.
My points are just that
Structured destruction is a good idea. (As is structured construction.)
Garbage collection can get in the way of this.
And again, half-facetiously:
GC is sometimes a byproduct of functional programming methodologies
that allow programmers to pay attention only to `values', and
not the objects required in order to arrive at them.
All that said, I think GC is a very useful tool! Really!
It is a serious contender if
Destruction consists only of releasing storage
and
The lifetimes of objects are so unstructured/complex that manual
reclamation is either slower, or more error-prone than GC.
This kind of situation comes up a lot.
But often a redesign that forces a more structured lifetime discipline
supporting simpler techniques is at least as attractive. I take that as
Wulf's main point. I think it's a good one.
How about a real example? For the amusement of C++ fans following
this discussion, I'll append one below. It's a fun variation of a
classic kind of situation where GC is commonly used. There are Cells
that may be linked together to create computational pipelines. While I
set it up to use GC, other choices are possible. Here are some of the
not-terribly-profound issues that came to my mind while writing it as
a classroom example:
* The main() doesn't really exercise the collector. One chain
is created and destroyed during the run.
* If this were a one-shot closed design, culminating only in a
cute but slow prime number sieve program, the best
solution is probably to do nothing, letting all
deletion occur at program termination, since there are
no other useful aspects of destruction.
* If this were still self-contained, but a variation of
the current main() were used as a callable function,
then it would make sense to insert code to mark the
base of the chain, use a stack-based allocator, and
release it on destruction via modified constructor/
destructor code. You'd also want to insert code to
ensure that at most one chain were active at a time.
* In fact, the *current* code using Sieve consists only of single
chains of Cells. If only such applications were supported,
* One could delete space via a virtual no-op destructor
in Cell, and one in Filter like
~Filter() { delete source; }
to cause a chain of deletions to occur on destruction.
* Except that the Cells do not know if their sources were
dynamically allocated or not, so `delete source' can be an
error if the source is a local. Perhaps, Cells could
refuse to support `local' creation, and only exist
dynamically. Alternatively some bookkeeping could be added
inside each Cell to record whether it was dynamically
allocated, and constructors and destructors would need to
use it. The former is probably a better strategy, since
the implementation is not set up to cleanly support
by-value argument passing of Cells, etc., anyway, and
there seems to be every reason not to do so.
* Digression: Actually, this is all a bit of a pain in
C++. In order to avoid clients having to say `new
Whatever', and thereafter dealing with pointers, you'd
need to set up a parallel set of classes, each of
which served as pointers to the `real' objects of the
corresponding classes, and then took care of the
operator `new' stuff on creation, etc. It's a case
where the convention taken in Eiffel, Smalltalk, etc.,
that names are by default just bound to references to
objects, not the objects themselves, would be more
convenient. In these kinds of situations, C++
programmers often settle for requiring clients to use
pointers or references explicitly. But even this has
drawbacks, since, while you can force clients to only
create objects via `new', the only means for doing this
leaves the restriction as a mostly run-time, not
compile-time error.
* But it is very possible for someone to make two Filters
point to the same source, or to create new subclasses with
multiple sources or sinks.
* The simple GC solution would continue to work in such cases.
* In order to deal with this and still keep virtual
destructors, some form of reference counting would be
required. Even this won't suffice if forms of circular
paths that give reference counters problems were created,
but such possibilities seem extremely unlikely, and
defensible as a documented restriction, akin to the
builtin limitation that only, say 32767 references are
allowed before an object is considered permanent. The
work involved in adapting this code to use reference
counts is not great, but is greater than using GC. The
overhead for reference counting vs GC would probably be
close, depending on the quality of the implementations. In
either case, storage management would probably amount to
a tiny percentage of execution time.
* Except that it makes no sense in *this* code for anyone
to hook up two Filters to the same source, so perhaps
Cells should record whether they've been used as sources
and refuse to comply if they are re-attached. But this
is reference-counting, just used in a different way.
* Except that maybe someone might want to make a subclass of
Cells (e.g., random number generators) that would be
useful to multiply tap.
* In either case, reference-counting might be more flexible,
since it could be used both for allocation and to guard
against some kinds of Cells being multiply tapped when
this is would be an error.
* What about programmers creating subclasses that display
themselves in action on a screen, or record their
actions in a log file? In such cases the reference-counting
version seems necessary, since destruction would require
real-time screen updates or file manipulation.
So, perhaps, reference-counting would be more appropriate, since it
may better serve the goal of a robust, reusable, extensible design.
I'll leave implementation as an exercise to the reader (in other
words, I'm too lazy to do it out right now!)
The example has no particular deep moral, except to reflect my feeling
that thinking about the full lifetimes of objects is good for you! It
helps focus attention on various aspects of the semantics of new types
in ways you otherwise might not have thought much about. And further,
that GC-oriented issues and reusability *do* interact, but that GC need
not always be chosen on such grounds.
I suppose it's worth a mention that the issues based on extensibility
via inheritence (i.e., design reuse) don't much come up in Ada.
Perhaps others could offer other analyses of this or other examples.
------------------------------- sieve.cc
// Fun with the prime number sieve.
// Based loosely on R. Sethi ``Programming Languages'', sec 6.7
// link to Boehm's comp.sources.unix C GC routines ...
extern "C" void gc_init();
extern "C" void* gc_malloc(unsigned int);
class GC // ... but wrap gc routines as a C++ class
{
public:
GC() { gc_init(); }
void* alloc(unsigned int sz) { return gc_malloc(sz); }
};
class Cell // an object with state and a way to update and report it
{
private:
static GC gc; // one collector for all Cells
protected:
int val; // the value to output next
virtual void update() {} // the updater; default as no-op
public:
Cell(int initial=0) :val(initial) {}
int next() { update(); return val; } // do one step
// set class allocation ops to use gc
void* operator new(unsigned int sz) { return gc.alloc(sz); }
void operator delete(void*) {}
};
GC Cell::gc = GC(); // static class member initialization
// must be done outside class decl in C++
class Counter : public Cell // output consecutive integers
{
protected:
void update() { ++val; }
public:
Counter(int initial=0) :Cell(initial) {}
};
class Filter : public Cell // transform input from another Cell
{
protected:
Cell* src; // input source
int inp; // last input
void get() { inp = src->next(); }
public:
Filter(Cell* s, int initial=0) :Cell(initial), src(s), inp(0) {}
};
class ModFilter : public Filter // hold last input not divisible by divisor
{
private:
int divisor;
int divides() { return inp % divisor == 0; }
public:
ModFilter(Cell* s, int d) : Filter(s), divisor(d) {}
void update() { do get(); while (divides()); val = inp; }
};
class Sieve : public Filter // Serve as terminal node of a prime number sieve
{
public:
Sieve() :Filter(new Counter(1)) {}
void update() { get(); src = new ModFilter(src, val = inp); }
};
#include
main(int argc, char** argv) // exercise everything a little
{
if (argc < 2)
{
cerr << "error: enter desired number of primes as program argument\n";
exit(1);
}
int n = abs(atoi(argv[1]));
Sieve s;
for (int i = 0; i < n; ++i) cout << s.next() << "\n";
}
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl
--
Doug Lea, Computer Science Dept., SUNY Oswego, Oswego, NY, 13126 (315)341-2367
email: dl@oswego.edu or dl%oswego.edu@nisc.nyser.net
UUCP :...cornell!devvax!oswego!dl or ...rutgers!sunybcs!oswego!dl