Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!mailrus!ames!necntc!necis!mrst!sdti!turner From: turner@sdti.UUCP (Prescott K. Turner) Newsgroups: comp.lang.c Subject: Re: Need info on IEEE quad format Summary: some info Message-ID: <312@sdti.UUCP> Date: 19 Sep 88 16:11:36 GMT Reply-To: turner@sdti.UUCP (Prescott K. Turner, Jr.) Organization: Software Development Technologies, Sudbury MA Lines: 63 In article <660016@hpclscu.HP.COM>, shankar@hpclscu.HP.COM (Shankar Unni) writes: >I need some info on IEEE floating point representation limits to construct >afile for ANSI C. I already have the info for single and double >floats: >... Here are some improvements to your single and double values: #define FLT_ROUNDS 1 /* is consistent with tie-breaking */ /* to nearest even significand */ #define DBL_MIN_EXP -1021 /* Because C uses an different */ #define DBL_MAX_EXP 1024 /* (inferior) model for floating */ #define FLT_MIN_EXP -125 /* point numbers from IEEE 754, its */ #define FLT_MAX_EXP 128 /* MIN_EXP and MAX_EXP values are */ /* different. */ #define DBL_MIN 2.2250738585072014e-308 /* more accurate */ #define DBL_MAX 1.7976931348623157e+308 /* than the latest */ /* draft C standard */ >The information I need is (for quad-precision (128-bit) floats): The draft C standard does not provide the figures for IEEE quad-precision because the IEEE 754 standard prescribes only lower limits for range and precision of a 'double extended' format. I will attempt to fill in your table, based on the quad format which appeared in an early draft of the IEEE standard, and which is supported by Intel coprocessors. #define LDBL_MANT_DIG 112 /* no hidden bit */ #define LDBL_EPSILON 3.851859888774471706111955885169855E-34L #define LDBL_DIG 33 #define LDBL_MIN_EXP -16381 #define LDBL_MIN 3.362103143112093506262677817321753E-4932L #define LDBL_MIN_10_EXP -4931 #define LDBL_MAX_EXP 16384 #define LDBL_MAX 1.1897314953572317650857593266280069E+4932L #define LDBL_MAX_10_EXP 4932 >The magnitude of the smallest de-normalized quad would also be useful... #define LDBL_DENORM_MIN 1E-4965L No need for lots of digits here, because the smallest denormalized number has only 1 bit of precision. Caveat: The IEEE standard has strict requirements on decimal-to-binary conversion for single and double, but even there it permits a little slack in converting the _MAX and _MIN constants. You're lucky if you have a decimal-to-binary conversion routine which will convert the above representations of LDBL_MAX and LDBL_MIN to the appropriate binary values. You could even get overflow. If there is a problem, it's more important that the constants convert correctly than that they themselves be accurate. Note that the C standard permits the macro names to be defined as expressions. Here's an idea for what might work: #define FLT_MAX (ldexp(1-6E-8, FLT_MAX_EXP)) #define FLT_MIN (ldexp(0.5, FLT_MIN_EXP)) #define DBL_MAX (ldexp(1-1E-16, DBL_MAX_EXP)) #define DBL_MIN (ldexp(0.5, DBL_MIN_EXP)) #define LDBL_MAX (ldexpl(1-2E-34L,LDBL_MAX_EXP)) #define LDBL_MIN (ldexpl(0.5L, LDBL_MIN_EXP)) -- Prescott K. Turner, Jr. Software Development Technologies, Inc. 375 Dutton Rd., Sudbury, MA 01776 USA (508) 443-5779 UUCP:...genrad!mrst!sdti!turner