Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 alpha 4/15/85; site natmlab.OZ
Path: utzoo!watmath!clyde!cbosgd!ihnp4!qantel!dual!lll-crg!gymble!umcp-cs!seismo!munnari!natmlab!rolf
From: rolf@natmlab.OZ (Rolf Turner)
Newsgroups: net.math.stat
Subject: The function 'hist' in S.
Message-ID: <296@natmlab.OZ>
Date: Thu, 26-Sep-85 03:28:37 EDT
Article-I.D.: natmlab.296
Posted: Thu Sep 26 03:28:37 1985
Date-Received: Sun, 29-Sep-85 07:00:43 EDT
Organization: CSIRO Maths and Stats & Applied Physics, Sydney, Australia
Lines: 54
Keywords: density

I've noticed that the hist function in S does something which
seems wrong to me.  When the argument 'scale' is TRUE, the
write-up says that hist produces counts on a "density scale".
It seems to me that if you're using the word "density" this
should imply that the histogram integrates to 1.  Instead,
hist produces counts that SUM to 1.  (However the i-th count
is NOT the estimated probability of an observation lying in the
i-th interval.

Explicitly

                          c(i)
	count(i) =  -----------------
                        sum c(j)
                         j

where
                         n(i)/n
            c(i) =  -----------------,      n = sum n(j),   w = sum w(j)  .
                         w(i)/w                  j               j

Observe that the denominators n and w cancel in the final count, and so are
irrelevant.  (Note: n(i) is the i-th count; w(i) is the width of the i-th
interval.)

I've decided to replace the old count(i)'s by the more sensible

                          n(i)
	    h(i) =  -----------------
                         n*w(i)

so that the integral of the histogram, = sum h(j)*w(j), is one.
                                          j

Moreover, the area of the i-th rectangle, h(i)*w(i), is now the estimated
probability of an observation lying in the i-th interval.
Observe that the old

                           h(i)
	count(i) =  -------------------
                         sum h(j)
                          j

The change was made by modifying the subroutine hhcntz, in hhcntz.r,
by replacing the "if (idens)" clause by the following:

if (idens) {	# change count into density if requested
	sumc = 0.
	do i = 1,nclass { sumc = sumc+class(i) }

	do i = 1,nclass {
		class(i) = class(i)/(sumc*(cbreak(i+1)-cbreak(i)))
	}
}