Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.3 alpha 4/15/85; site natmlab.OZ Path: utzoo!watmath!clyde!cbosgd!ihnp4!qantel!dual!lll-crg!gymble!umcp-cs!seismo!munnari!natmlab!rolf From: rolf@natmlab.OZ (Rolf Turner) Newsgroups: net.math.stat Subject: The function 'hist' in S. Message-ID: <296@natmlab.OZ> Date: Thu, 26-Sep-85 03:28:37 EDT Article-I.D.: natmlab.296 Posted: Thu Sep 26 03:28:37 1985 Date-Received: Sun, 29-Sep-85 07:00:43 EDT Organization: CSIRO Maths and Stats & Applied Physics, Sydney, Australia Lines: 54 Keywords: density I've noticed that the hist function in S does something which seems wrong to me. When the argument 'scale' is TRUE, the write-up says that hist produces counts on a "density scale". It seems to me that if you're using the word "density" this should imply that the histogram integrates to 1. Instead, hist produces counts that SUM to 1. (However the i-th count is NOT the estimated probability of an observation lying in the i-th interval. Explicitly c(i) count(i) = ----------------- sum c(j) j where n(i)/n c(i) = -----------------, n = sum n(j), w = sum w(j) . w(i)/w j j Observe that the denominators n and w cancel in the final count, and so are irrelevant. (Note: n(i) is the i-th count; w(i) is the width of the i-th interval.) I've decided to replace the old count(i)'s by the more sensible n(i) h(i) = ----------------- n*w(i) so that the integral of the histogram, = sum h(j)*w(j), is one. j Moreover, the area of the i-th rectangle, h(i)*w(i), is now the estimated probability of an observation lying in the i-th interval. Observe that the old h(i) count(i) = ------------------- sum h(j) j The change was made by modifying the subroutine hhcntz, in hhcntz.r, by replacing the "if (idens)" clause by the following: if (idens) { # change count into density if requested sumc = 0. do i = 1,nclass { sumc = sumc+class(i) } do i = 1,nclass { class(i) = class(i)/(sumc*(cbreak(i+1)-cbreak(i))) } }