Path: utzoo!utgpu!watmath!watdragon!rose!ccplumb From: ccplumb@rose.waterloo.edu (Colin Plumb) Newsgroups: comp.arch Subject: Re: IEEE FP denorms and Deming's Arithmetics With Variable Precision Message-ID: <16893@watdragon.waterloo.edu> Date: 2 Oct 89 22:41:46 GMT References:Sender: daemon@watdragon.waterloo.edu Reply-To: ccplumb@rose.waterloo.edu (Colin Plumb) Organization: U. of Waterloo, Ontario Lines: 61 (I'm going a bit out on a limb, since someone with more experience than I may prove all my ideas total nonsense, but this is what I learned in conversation with a member of the IEEE 754 standards committee.) In article aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) writes: > Deming shows how this tradeoff moves the complexity of coding > reliable numerical software from avoiding overflow, to handling > roundoff. > IE. reduced precision makes rounding error analysis more > complicated. True. There are some really useful axioms binary floating-point obeys that, say, IBM-style base-16 stuff doesn't. E.g. a+b/2 lies on the closed interval between a and b. In base-16, of a and b both have the high bit of the high 4-bit digit set, then adding them causes you to shift by 4 bits at once, dropping 3 off the bottom. Dividing by 2 causes 0's to be shifted in and makes a mess of things. This can happen anywhere you can lose more than 1 bit of mantissa in an addition step, such as variable-size exponent encodings. > QUESTION: > Don't the same arguments apply to IEEE Floating Point with > denormalized numbers? Ie. don't denormalized numbers complicate > roundoff error analysis in the same way reduced precision complicates > the other arithmetics? Surprisingly,. no... they improve things! I've seen a letter from no less eminent an authority than Our Lord Knuth retracting his opposition to denormalised numbers. This is because denormalised numbers let you add and subtract near the lower end of the expressible range without losing absolute precision. Consider a representation without denormalised numbers. There is some minimum exponent, 2^-min, which can be multiplied by a mantissa from 1.111...111 to 1.000...000. The difference made by a 1 in the least significant bit of the mantissa is 2^-min*0.000...001, 2^-(min+mantsize). You can add and subtract a lot of these units, but once you get down to 1x2^-min, the jump is 2^-min all the way down to zero. Rather annoying! Denormalised numbers let you express the difference between any two representable numbers with as much absolute accuracy as the least precise of the inputs. Rather useful for fiddling with the last few bits of error term in some messy polynomial approximation or whatever. > Deming suggests a sticky register which tracks the least relative > precision ever used in the calculation of intermediate results, which > will give you worst-case rounding error. > Would such a register be worthwhile tracking the most extremely > denormalized IEEE FP number encountered? > Does anyone do this sort of thing? I don't know... generally those who are really concerned about such things do interval arithmetic, keeping two answers at all stages which the true answer is guaranteed to lie between. There are problems with covariance (even if x is x+/- epsilon ,x-x is exactly zero, not 0+/-2*epsilon), but if provides good worst-case bounds. Addition and multiplication do rather different things to error bounds. For the former, an absolute error bound is best; for the latter, a relative error. Mixing the two leads to all sorts of messy analysis. This is one of the reasons that specifying the rounding mode in the instruction rather than a mode register is A Good Thing. -- -Colin