MATHS@ZHAO Yin: Bayes' Rule Examples

Bayes' Rule
$$P(\theta|D, M)=\frac{P(D|\theta, M)\cdot P(\theta, M)}{P(D|M)}$$ where

$\theta$ is the parameter values.
$D$ is the data values.
$M$ is the model structure.
$P(\theta|D, M)$ is the posterior probability, which is the strength of our belief in $\theta$ generated by the model $M$ when the data $D$ have been taken into account.
$P(D|\theta, M)$ is the likelihood, which is the probability that the data could be generated by the model $M$ with parameter values $\theta$.
$P(\theta, M)$ is the prior probability, which is the strength of our belief in $\theta$ generated by the model $M$ without the data $D$.
$P(D|M)$ is the evidence, which is the probability of the data according to the model $M$, determined by summing across all possible parameter values $\theta$ (i.e. likelihood) weighted by the strength of belief in those parameter values (i.e. prior). Particularly,
- For discrete variables $$P(D|M)=\sum_{\theta}P(D|\theta, M)\cdot P(\theta, M)$$
- For continuous variables $$P(D|M)=\int_{\theta}P(D|\theta, M)\cdot P(\theta, M)\ d\theta$$

Coin Flipping Example
Suppose that we flip a coin and we believe that there are only three possible values for the coin's bias (denote it as $\theta$): $$P(\theta=0.5)=0.5,\ P(\theta=0.25)=0.25,\ P(\theta=0.75)=0.25$$ which can be considered as the prior probability distribution. The data consist of a specific sequence of flips with 3 heads and 9 tails, so $$P(D|\theta)=\theta^{3}\cdot(1-\theta)^9$$ We can calculate the posterior probability:
First, computing the likelihood:
$$P(D|\theta=0.25)=0.25^3\times(1-0.25)^9=1.173198\times10^{-3}$$ $$P(D|\theta=0.5)=0.5^3\times(1-0.5)^9=2.441406\times10^{-4}$$ $$P(D|\theta=0.75)=0.75^3\times(1-0.75)^9=1.609325\times10^{-6}$$ Thus the evidence is $$P(D)=\sum_{\theta}P(D|\theta)\cdot P(\theta)=4.157722\times10^{-4}$$ Therefore, the posterior probabilities are: $$P(\theta=0.25|D)=\frac{P(D|\theta=0.25)\cdot P(\theta=0.25)}{P(D)}=0.7054333023$$ $$P(\theta=0.5|D)=\frac{P(D|\theta=0.5)\cdot P(\theta=0.5)}{P(D)}=0.2935990252$$ $$P(\theta=0.75|D)=\frac{P(D|\theta=0.75)\cdot P(\theta=0.75)}{P(D)}=0.0009676726$$ R code:

likelihood = function(p){
  p^3 * (1 - p)^9
}
likeli = likelihood(c(0.25, 0.5, 0.75))
prior = c(0.25, 0.5, 0.25)
evidence = sum(prior * likeli)
post = prior * likeli / evidence
likeli; evidence; post 
# [1] 1.173198e-03 2.441406e-04 1.609325e-06
# [1] 0.0004157722
# [1] 0.7054333023 0.2935990252 0.0009676726

Furthermore, we can plot the graphs of the above process:

R code:

In this example, we supposed that the bias $\theta$ could take on only three possible values. This restriction made the model (i.e. $M$) rather simple. Can we get a better result if we used a more complex model? For instance, 50 possible values of $\theta$ instead of only 3. The following graph shows this result:

data = c(rep(1, 3), rep(0, 9))
plotBayes(50, data)

The second figure indicates that the evidence $P(D)=0.000393$ which is a little smaller than the previous simple model, $P(D)=0.000416$. In this case, we believe in the simple model better than the complex one.
The complex models can be winners. For example, consider a case in which the observed data have just 1 head and 11 tails. None of the $\theta$ values in the simple model is close to this outcome ($\theta=0.25,\ 0.5,\ 0.75$). But the complex model does have some $\theta$ values near the observed proportion ($\theta=1/51,\ldots, 50/51$).

data = c(rep(1, 1), rep(0, 11))
plotBayes(3, data)
plotBayes(50, data)

The above two figures show that the simple model has less evidence $$P(D_{\text{simple}})=0.00276 < P(D_{\text{complex}})=0.00366$$ and thus we have stronger belief in the complex model.

Breast Cancer Example
Published studies have shown that, among women who have a screening mammogram, the proportion who are diagnosed with breast cancer within 1 year is about 0.0045. If a person does have breast cancer (i.e. has a confirmed diagnosis of breast cancer within 1 year after the mammogram), then the probability that the screening mammogram will be positive is about 0.724. On the other hand, for screening mammograms, if a person does not have breast cancer then the probability that the screening mammogram will be positive is about 0.027. The questions is:
What is the probability that a woman does have breast cancer given that the screening mammogram is positive?
We will use the notation $D_{+},\ D_{-}$ to represent the event that a person has and does not have breast cancer, respectively. And use the notation $M_{+},\ M_{-}$ to indicate the event that the person has a positive and negative mammogram result, respectively. Thus, we can summarize the following: $$P(D_{+})=0.0045\Rightarrow P(D_{-})=1-P(D_{+})=0.9955$$ $$P(M_{+}|D_{+})=0.724\Rightarrow P(M_{-}|D_{+})=1-P(M_{+}|D_{+})=0.276$$ $$P(M_{+}|D_{-})=0.027\Rightarrow P(M_{-}|D_{-})=1-P(M_{+}|D_{-})=0.973$$ Thus the posterior probability is $$P(D_{+}|M_{+})=\frac{P(M_{+}|D_{+})\cdot P(D_{+})}{P(M_{+})}$$ $$=\frac{P(M_{+}|D_{+})\cdot P(D_{+})}{P(M_{+}|D_{+})\cdot P(D_{+})+P(M_{+}|D_{-})\cdot P(D_{-})}$$ $$=\frac{0.724\times0.0045}{0.724\times0.0045+0.027\times0.9955}=0.1081081$$ Furthermore, if the screening mammogram is positive then the person would be recommended to undergo a procedure called SCNB. If cancerous cells are observed, the test is considered positive and surgery is indicated. According to the published studies, the probability that SCNB will be positive given that the person does have cancer is 0.89. While the probability that SCNB will be negative given that the person does not have cancer is 0.94. The question is:
What is the probability that the woman whose mammogram is positive does have breast cancer given that the SCNB is negative?
We rewrite some new conditions: $$P(S_{+}|D_{+})=0.89\Rightarrow P(S_{-}|D_{+})=1-P(S_{+}|D_{+})=0.11$$ $$P(S_{-}|D_{-})=0.94\Rightarrow P(S_{+}|D_{-})=1-P(S_{-}|D_{-})=0.06$$ And particularly, the posterior probability of the previous question, $P(D_{+}|M_{+})$ is the prior probability of this question, $P(D_{+})$. The posterior probability is: $$P(D_{+}|S_{-})=\frac{P(S_{-}|D_{+})\cdot P(D_{+})}{P(S_{-})}$$ $$=\frac{P(S_{-}|D_{+})\cdot P(D_{+}|M_{+})}{P(S_{-})}$$ $$=\frac{P(S_{-}|D_{+})\cdot P(D_{+}|M_{+})}{P(S_{-}|D_{+})\cdot P(D_{+}|M_{+})+P(S_{-}|D_{-})\cdot (1-P(D_{+}|M_{+}))}$$ $$=\frac{0.11\times0.1081081}{0.11\times0.1081081+0.94\times(1-0.1081081)}=0.01398601$$ Therefore, after the SCNB the posterior probability that the woman has breast cancer is only 0.014, much smaller than the mammogram test, which is 0.108.

Reference:
Kruschke, J., K. (2010). Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Chapter 4.
Cowles, M. K. (2013). Applied Bayesian Statistics: With R and OpenBUGS Examples. Chapter 1.

Download this article as PDF format

MATHS@ZHAO Yin

Bayes' Rule Examples

没有评论:

发表评论