Processing math: 100%

Basic Concept of Probability Distributions 3: Geometric Distribution

PDF version

PMF
Suppose that independent trials, each having a probability p, 0 < p < 1, of being a success, are performed until a success occurs. If we let X equal the number of failures required, then the geometric distribution mass function is f(x; p) =\Pr(X=x) = (1-p)^{x}p for x=0, 1, 2, \cdots.
Proof:
\begin{align*} \sum_{x=0}^{\infty}f(x; p) &= \sum_{x=0}^{\infty}(1-p)^{x}p\\ &= p\sum_{x=0}^{\infty}(1-p)^{x}\\ & = p\cdot {1\over 1-(1-p)}\\ & = 1 \end{align*}
Mean
The expected value is \mu = E[X] = {1-p\over p}
Proof:
Firstly, we know that \sum_{x=0}^{\infty}p^x = {1\over 1-p} where 0 < p < 1. Thus \begin{align*} {d\over dp}\sum_{x=0}^{\infty}p^x &= \sum_{x=1}^{\infty}xp^{x-1}\\ &= {1\over(1-p)^2} \end{align*} The expected value is \begin{align*} E[X] &= \sum_{x=0}^{\infty}x(1-p)^{x}p\\ &=p(1-p)\sum_{x=1}^{\infty}x(1-p)^{x-1}\\ &= p(1-p){1\over(1-(1-p))^2}\\ &= {1-p\over p} \end{align*} Variance
The variance is \sigma^2 = \mbox{Var}(X) = {1-p\over p^2}
Proof:
\begin{align*} E\left[X^2\right] &=\sum_{x=0}^{\infty}x^2(1-p)^{x}p\\ &= (1-p)\sum_{x=1}^{\infty}x^2(1-p)^{x-1}p \end{align*}
Rewrite the right hand summation as
\begin{align*} \sum_{x=1}^{\infty} x^2(1-p)^{x-1}p&= \sum_{x=1}^{\infty} (x-1+1)^2(1-p)^{x-1}p\\ &= \sum_{x=1}^{\infty} (x-1)^2(1-p)^{x-1}p + \sum_{x=1}^{\infty} 2(x-1)(1-p)^{x-1}p + \sum_{x=1}^{\infty} (1-p)^{x-1}p\\ &= E\left[X^2\right] + 2E[X] + 1\\ &= E\left[X^2\right] + {2-p\over p} \end{align*}
Thus E\left[X^2\right] = (1-p)E\left[X^2\right] + {(1-p)(2-p) \over p} That is E\left[X^2\right]= {(1-p)(2-p)\over p^2}
So the variance is
\begin{align*} \mbox{Var}(X) &= E\left[X^2\right] - E[X]^2\\ &= {(1-p)(2-p)\over p^2} - {(1-p)^2\over p^2}\\ &= {1-p\over p^2} \end{align*}
Examples

1. Let X be geometrically distributed with probability parameter p={1\over2}. Determine the expected value \mu, the standard deviation \sigma, and the probability P\left(|X-\mu| \geq 2\sigma\right). Compare with Chebyshev's Inequality.

Solution:
The geometric distribution mass function is f(x; p) = (1-p)^{x}p,\ x=0, 1, 2, \cdots The expected value is \mu = {1-p\over p} = 1 The standard deviation is \sigma = \sqrt{1-p\over p^2} = 1.414214 The probability that X takes a value more than two standard deviations from \mu is P\left(|X-1| \geq 2.828428\right) = P(X\geq 4) = 0.0625 R code:
1 - sum(dgeom(c(0:3), 1/2))
# [1] 0.0625 
Chebyshev's Inequality gives the weaker estimation P\left(|X - \mu| \geq 2\sigma\right) \leq {1\over4} = 0.25
2. A die is thrown until one gets a 6. Let V be the number of throws used. What is the expected value of V? What is the variance of V?

Solution:
The PMF of geometric distribution is f(x; p) = (1-p)^xp,\ = 0, 1, 2, \cdots where p = {1\over 6}. Let X = V-1, so the expected value of V is
\begin{align*} E[V] &= E[X+1]\\ &= E[X] + 1\\ &= {1-p\over p} + 1\\ &= {1-{1\over6} \over {1\over6}} + 1\\ &= 6 \end{align*}
The variance of V is
\begin{align*} \mbox{Var}(V) &= \mbox{Var}(X+1)\\ &= \mbox{Var}(X)\\ &= {1-p\over p^2}\\ &= {1-{1\over 6} \over \left({1\over6}\right)^2}\\ &= 30 \end{align*}
Note that this is another form of the geometric distribution which is so-called the shifted geometric distribution (i.e. X equals to the number of trials required).
By the above process we can see that the expected value of the shifted geometric distribution is \mu = {1\over p} and the variance of the shifted geometric distribution is \sigma^2 = {1-p\over p^2}
3. Assume W is geometrically distributed with probability parameter p. What is P(W < n)?

Solution:
\begin{align*} P(W < n) &= 1 - P(W \geq n)\\ &= 1-(1-p)^n \end{align*} 4. In order to test whether a given die is fair, it is thrown until a 6 appears, and the number n of throws is counted. How great should n be before we can reject the null hypothesis H_0: \mbox{the die is fair} against the alternative hypothesis H_1: \mbox{the probability of having a 6 is less than 1/6} at significance level 5\%?  

Solution:
The probability of having to use at least n throws given H_0 (i.e. the significance probability) is P = \left(1 - {1\over 6}\right) ^n We will reject H_0 if P < 0.05. R code:
n = 1
while (n > 0){
+   p = (5/6) ^ n
+   if (p < 0.05) break
+   n = n + 1
+ }
n
# [1] 17 
That is, we have to reject H_0 if n is at least 17.



Reference
  1. Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
  2. Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 5 & 10. ISBN: 978-87-7681-409-0.



没有评论:

发表评论