Basic Concept of Probability Distributions 1: Binomial Distribution

PDF version

PMF
If the random variable $X$ follows the binomial distribution with parameters $n$ and $p$, we write $X \sim B(n, p)$.
The probability of getting exactly $x$ successes in $n$ trials is given by the probability mass function: $$f(x; n, p) = \Pr(X=x) = {n\choose x}p^{x}(1-p)^{n-x}$$ for $x=0, 1, 2, \cdots$ and ${n\choose x} = {n!\over(n-x)!x!}$.
Proof:
$$
\begin{align*}
\sum_{x=0}^{\infty}f(x; n, p) &= \sum_{x=0}^{\infty}{n\choose x}p^{x}(1-p)^{n-x}\\
&= [p + (1-p)]^{n}\;\;\quad\quad \mbox{(binomial theorem)}\\
&= 1
\end{align*}
$$
Mean
The expected value is $$\mu = E[X] = np$$
Proof:
$$
\begin{align*}
E\left[X^k\right] &= \sum_{x=0}^{\infty}x^{k}{n\choose x}p^{x}(1-p)^{n-x}\\
&= \sum_{x=1}^{\infty}x^{k}{n\choose x}p^{x}(1-p)^{n-x}\\
&= np\sum_{x=1}^{\infty}x^{k-1}{n-1\choose x-1}p^{x-1}(1-p)^{n-x}\quad\quad\quad (\mbox{identity}\ x{n\choose x} = n{n-1\choose x-1})\\
&= np\sum_{y=0}^{\infty}(y+1)^{k-1}{n-1\choose y}p^{y}(1-p)^{n-1-y}\quad(\mbox{substituting}\ y=x-1)\\
&= npE\left[(Y + 1)^{k-1}\right] \quad\quad\quad \quad\quad\quad \quad\quad\quad\quad\quad (Y\sim B(n-1, p)) \\
\end{align*}
$$
Using the identity
$$
\begin{align*}
x{n\choose x} &= {x\cdot n!\over(n-x)!x!}\\
& = {n!\over(n-x)!(x-1)!}\\
&= n{(n-1)!\over[(n-1)-(x-1)]!(x-1)!}\\
&= n{n-1\choose x-1}
\end{align*}
$$
Hence setting $k=1$ we have $$E[X] = np$$
Variance
The variance is $$\sigma^2 = \mbox{Var}(X) = np(1-p)$$
Proof:
$$
\begin{align*}
\mbox{Var}(X) &= E\left[X^2\right] - E[X]^2\\
&= npE[Y+1] - n^2p^2\\
& = np\left(E[Y] + 1\right) - n^2p^2\\
& = np[(n-1)p + 1] - n^2p^2\quad\quad (Y\sim B(n-1, p))\\
&= np(1-p)
\end{align*}
$$
Examples

1. Let $X$ be binomially distributed with parameters $n=10$ and $p={1\over2}$. Determine the expected value $\mu$, the standard deviation $\sigma$, and the probability $P\left(|X-\mu| \geq 2\sigma\right)$. Compare with Chebyshev's Inequality.

Solution:
The binomial mass function is $$f(x) ={n\choose x} p^x \cdot q^{n-x},\ x=0, 1, 2, \cdots$$ where $q=1-p$. The expected value and the standard deviation are $$E[X] = np=5,\ \sigma = \sqrt{npq} = 1.581139$$ The probability that $X$ takes a value more than two standard deviations from $\mu$ is
$$
\begin{align*}
P\left(|X-\mu| \geq 2\sigma\right) &= P\left(|X-5| \geq 3.2\right)\\
&= P(X\leq 1) + P(X \geq9)\\
&= \sum_{x=0}^{1}{10\choose x}p^{x}(1-p)^{10-x} + \sum_{x=9}^{\infty}{10\choose x}p^{x}(1-p)^{10-x}\\
& = 0.02148437
\end{align*}
$$
R code:
sum(dbinom(c(0, 1), 10, 0.5)) + 1 - sum(dbinom(c(0:8), 10, 0.5))
# [1] 0.02148437
pbinom(1, 10, 0.5) + 1 - pbinom(8, 10, 0.5)
# [1] 0.02148438 
Chebyshev's Inequality gives the weaker estimation $$P\left(|X - \mu| \geq 2\sigma\right) \leq {1\over2^2} = 0.25$$
2. What is the probability $P_1$ of having at least six heads when tossing a coin ten times?

Solution:
$$
\begin{align*}
P(X \geq 6) &= \sum_{x=6}^{10}{10\choose x}0.5^{x}0.5^{10-x}\\
&= 0.3769531
\end{align*}
$$ R code:
1 - pbinom(5, 10, 0.5)
# [1] 0.3769531
sum(dbinom(c(6:10), 10, 0.5))
# [1] 0.3769531 
3. What is the probability $P_2$ of having at least 60 heads when tossing a coin 100 times?

Solution:
$$
\begin{align*}
P(X \geq 60) &= \sum_{x=60}^{100}{100\choose x}0.5^{x}0.5^{100-x}\\
&= 0.02844397
\end{align*}
$$ R code:
1 - pbinom(59, 100, 0.5)
# [1] 0.02844397
sum(dbinom(c(60:100), 100, 0.5))
# [1] 0.02844397 
Alternatively, we can use normal approximation (generally when $np > 5$ and $n(1-p) > 5$). $\mu = np=50$ and $\sigma = \sqrt{np(1-p)} = \sqrt{25}$. $$
\begin{align*}
P(X \geq 60) &= 1 - P(X \leq 59)\\
&= 1- \Phi\left({59.5-50\over \sqrt{25}}\right)\\
&= 1-\Phi(1.9)\\
&= 0.02871656
\end{align*}
$$ R code:
1 - pnorm(1.9)
# [1] 0.02871656 
4. What is the probability $P_3$ of having at least 600 heads when tossing a coin 1000 times?

Solution:
$$
\begin{align*}
P(X \geq 600) &= \sum_{x=600}^{1000}{1000\choose x} 0.5^{x} 0.5^{1000-x}\\
&= 1.364232\times10^{-10}
\end{align*}
$$ R code:
sum(dbinom(c(600:100), 1000, 0.5))
# [1] 1
sum(dbinom(c(600:1000), 1000, 0.5))
# [1] 1.364232e-10 
Alternatively, we can use normal approximation. $\mu = np=500$ and $\sigma = \sqrt{np(1-p)} = \sqrt{250}$. $$
\begin{align*}
P(X \geq 600) &= 1 - P(X \leq 599)\\
&= 1- \Phi\left({599.5-500\over \sqrt{250}}\right)\\
&= 1.557618 \times 10^{-10}
\end{align*}
$$ R code:
1 - pnorm(99.5/sqrt(250))
# [1] 1.557618e-10 



Reference
  1. Ross, S. (2010). A First Course in Probability (8th Edition). Chapter 4. Pearson. ISBN: 978-0-13-603313-4.
  2. Brink, D. (2010). Essentials of Statistics: Exercises. Chapter 5 & 8. ISBN: 978-87-7681-409-0.






没有评论:

发表评论