Path: utzoo!utgpu!watmath!watdragon!rose!ccplumb
From: ccplumb@rose.waterloo.edu (Colin Plumb)
Newsgroups: comp.arch
Subject: Re: IEEE FP denorms and Deming's Arithmetics With Variable Precision
Message-ID: <16893@watdragon.waterloo.edu>
Date: 2 Oct 89 22:41:46 GMT
References: 
Sender: daemon@watdragon.waterloo.edu
Reply-To: ccplumb@rose.waterloo.edu (Colin Plumb)
Organization: U. of Waterloo, Ontario
Lines: 61

(I'm going a bit out on a limb, since someone with more experience than I
may prove all my ideas total nonsense, but this is what I learned in
conversation with a member of the IEEE 754 standards committee.)

In article  aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) writes:
> Deming shows how this tradeoff moves the complexity of coding
> reliable numerical software from avoiding overflow, to handling
> roundoff.  
>     IE. reduced precision makes rounding error analysis more
> complicated.

True.  There are some really useful axioms binary floating-point obeys
that, say, IBM-style base-16 stuff doesn't.  E.g. a+b/2 lies on the closed
interval between a and b.  In base-16, of a and b both have the high bit
of the high 4-bit digit set, then adding them causes you to shift by 4
bits at once, dropping 3 off the bottom.  Dividing by 2 causes 0's to be
shifted in and makes a mess of things.  This can happen anywhere you can lose
more than 1 bit of mantissa in an addition step, such as variable-size
exponent encodings.

> QUESTION:
>     Don't the same arguments apply to IEEE Floating Point with
> denormalized numbers?  Ie. don't denormalized numbers complicate
> roundoff error analysis in the same way reduced precision complicates
> the other arithmetics?

Surprisingly,. no... they improve things!  I've seen a letter from no less
eminent an authority than Our Lord Knuth retracting his opposition to
denormalised numbers.  This is because denormalised numbers let you add
and subtract near the lower end of the expressible range without losing
absolute precision.  Consider a representation without denormalised
numbers.  There is some minimum exponent, 2^-min, which can be multiplied
by a mantissa from 1.111...111 to 1.000...000.  The difference made by
a 1 in the least significant bit of the mantissa is 2^-min*0.000...001,
2^-(min+mantsize).  You can add and subtract a lot of these units, but
once you get down to 1x2^-min, the jump is 2^-min all the way down to
zero.  Rather annoying!  Denormalised numbers let you express the difference
between any two representable numbers with as much absolute accuracy
as the least precise of the inputs.  Rather useful for fiddling with the
last few bits of error term in some messy polynomial approximation or whatever.

> Deming suggests a sticky register which tracks the least relative
> precision ever used in the calculation of intermediate results, which
> will give you worst-case rounding error.
>     Would such a register be worthwhile tracking the most extremely
> denormalized IEEE FP number encountered?
>     Does anyone do this sort of thing?

I don't know... generally those who are really concerned about such things
do interval arithmetic, keeping two answers at all stages which the true answer
is guaranteed to lie between.  There are problems with covariance (even if
x is x+/- epsilon ,x-x is exactly zero, not 0+/-2*epsilon), but if provides
good worst-case bounds.  Addition and multiplication do rather different
things to error bounds.  For the former, an absolute error bound is
best; for the latter, a relative error.  Mixing the two leads to all sorts
of messy analysis.

This is one of the reasons that specifying the rounding mode in the instruction
rather than a mode register is A Good Thing.
-- 
	-Colin