Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site wateng.UUCP
Path: utzoo!watmath!wateng!dclee
From: dclee@wateng.UUCP (David C. Lee)
Newsgroups: net.math
Subject: Re: Pistachio Probabilities
Message-ID: <2772@wateng.UUCP>
Date: Fri, 23-Aug-85 14:49:43 EDT
Article-I.D.: wateng.2772
Posted: Fri Aug 23 14:49:43 1985
Date-Received: Sat, 24-Aug-85 19:14:30 EDT
References: <285@ihnet.UUCP>
Distribution: net
Organization: U of Waterloo, Ontario
Lines: 72

Subject: Re: Pistachio Probabilities
Newsgroups: net.math
Distribution: net
References: <285@ihnet.UUCP>

Here is the original question about Pistachio Probabilities:

>Suppose you begin with a bag containing O openable pistachios,
>and U unopenable pistachios.  A trial consists of selecting a nut at random,
>and eating it (if possible), or returning it to the bag.
>How many trials, on the average, are required to consume all the "openable"
>pistachios?  Express the answer in terms of U and O.
>Any ideas on the variance/distribution of trials(U,O)?

Rather than using Openable & Unopenable (O & 0 is confusing),
I will use Good and Bad.

Suppose we associate a stage of the experiment with the consumption of 
a good pistachio.  For example at the beginning of stage (1), 
we have (G) good ones and (B) bad ones; at stage (2), [G-1,B];
... ;at stage (i), [G-i+1,B]; ... ; and finally at stage (G), [1,B].

Let us analyze this experiment stage-by-stage.
At any stage (i) with [G-i+1,B], we have the following results:
      p = #good / (#good+#bad) = (G-i+1) / (G-i+1 + B)
          - probability of picking a good one at each trial
      q = 1-p
   P(k) = p q^(k-1)
          - probability of finding the first good one at the k-th trial
          - recall geometric distribution
              with expected value E(k) = 1/p
              and variance V(k) = q/(p^2)

So at stage (1), [G,B]   (found and consumed the 1st pistachio)
  the expected number of trials is E(k) = (G+B)/G = 1 + B/G
  with the variance of             V(k) = (B/(G+B)) / (G/(G+B))^2
                                        = (B/G) + (B/G)^2
At stage (2),    [G-1,B] (found and consumed the 2nd pistachio)
  the expected number of trials is E(k) =  1 + B/(G-1)
  with the variance of             V(k) =  (B/(G-1)) + (B/(G-1))^2
...
At stage (G),    [1,B]   (Ha-Ha, the last one!)
  the expected number of trials is E(k) =  1 + B
  with the variance of             V(k) =  (B) + (B)^2

The expected number of trials to consume all pistachio is:
        N = k(1) + k(2) + ... + k(G)
     E(N) = E(k(1))     + ... + E(k(G))
          = { 1 + B/G } + ... + { 1 + B }
          = G + B { 1/G + 1/(G-1) + ... + 1 }           (reducible?)

Since k(1), k(2), ..., k(G) are independent random variables,
(i.e. each stage of the experiment is independent of others)
the variance of N is:
     V(N) = V(k(1))       + V(k(2))     + ... + V(k(G))
          =   B { 1/G     + 1/(G-1)     + ... + 1 } + 
            B^2 { (1/G)^2 + (1/(G-1))^2 + ... + 1 }     (reducible?)

Further comments:

     No doubt about it!  The task of finding a good pistachio 
gets harder with each being consumed.  To get the (i+1)-th pistachio 
we would need an extra effort of B/{(G-i)(G-i+1)} trials 
compared to (i)-th pistachio, i.e. an increase of no more than 100/G% 
at the beginning for large values of G and B, but with the increase 
of as much as 200% at the end for a large B.

     Finally by "Central Limit Theorem", N tends to be normally 
distributed with mean E(N) and variance V(N) as dervied above,
for large values of B and G.

     David C. Lee @ University of Waterloo, Ontario