Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!mailrus!csd4.milw.wisc.edu!cs.utexas.edu!husc6!ogccse!blake!lgy
From: lgy@blake.acs.washington.edu (Laurence Yaffe)
Newsgroups: comp.lang.c++
Subject: Re: named return values
Message-ID: <3163@blake.acs.washington.edu>
Date: 9 Aug 89 07:15:36 GMT
References: <1826@cmx.npac.syr.edu> <26302@shemp.CS.UCLA.EDU> <6444@columbia.edu>
Reply-To: lgy@newton.phys.washington.edu (Laurence Yaffe)
Organization: University of Washington, Seattle
Lines: 50

In article <6444@columbia.edu> kearns@cs.columbia.edu writes:

>While this is very nice in theory, in practice it can lead to horrible
>performance because of the various temporary matrices that are created.  

	[ various comments about the desirability of explicitly
	  controlling memory use for matrix operations deleted ]

> It is also
>more "honest":  matrices are NOT good candidates for having value semantics
>because their copying time is large.  

>-steve
>(kearns@cs.columbia.edu)

    The claim that frequent copying of matrices causes unacceptable
performance degradation appears to be common dogma, but what real evidence
supports this?  Since most common operations on matrices  (multiplication,
diagonalization, decomposition, inversion, ...) involve order N^3 operations
for N-dimensional matrices, while copying is only order N^2, the overhead
of copying will be significant only if (a) matrices are small and copies
are very frequent (compared to other operations), (b) matrices are so large
that memory limitation intervene, or (c) no O(N^3) operations are being
performed.

    In years of my own work, I've never seen real examples of case (c),
and only a few examples of case (a).  Over quite a range of applications,
I've found that the breakeven point where O(N^2) copies become important
is well under N=10, typically 3 or 4.  And for compute intensive
applications with matrices that small, special methods tend to be more
appropriate (fixed dimension types, inline coding, ...).  I have run
into examples in case (b), most recently in a calculation involving
1280 x 1280 dimensional matrices which needed more than 80 Mb of swap space!
But this type of problem seems to be largely a thing of the past - unless
you have a very fast machine or the patience to do O(N^3) operations on
1000 x 1000 matrices.

    On all the machines I've used, sequentially accessing all matrix elements
in a row is significantly faster than accessing a column (better locality
of reference, faster pointer increment).  And yet surprisingly few canned
matrix multiply routines pre-transpose one of the matrices (or use equivalent
tricks involving an O(N^2) movement of data) in order to take advantage of this
fact.  Absolutely criminal...

    Anyone have real data (or just more anecdotal tales) on the significance
of matrix copies in real applications?

-- 
Laurence G. Yaffe		Internet: lgy@newton.phys.washington.edu
University of Washington	  Bitnet: yaffe@uwaphast.bitnet