Bayes background - (messy - made for my own recap)

4 minute read

LDA

BAYES STATISTICS

LDA is a unsupervised learning method which originates from Bayes Statitistics.

BAYES FORMULA

General Shape of bayes formula in probability Theory $P(A \vert B)=\frac{P(B \vert A)P(A)}{P(B \vert A)P(A)+P(B \vert \neg A)P(\neg A)}$

Thorough tutorial into Bayes rule in Probability Theory - includes likelihood ratio

An example of this would be to replace variables like this:

A - Is sick
B - Test shows sick.
Then we would be asking the probability of somebody being sick if the test is showing sickness. $P(A \vert B)$ .

BAYES FORMULA IN BAYESIAN INFERENCE

In bayes statistics we suppose that there is a distribution over $\theta$ . In bayesian statistic $\theta$ varies. We assume that the data is fixed.

In statistics, the main aim of bayesian rule is to calculate the probability of theta- parameter/s based on the given data. We often do this incrementally. E.g. we are improving/deepening our belief in certain parameters as the amount of data “goin through” our calculations grows.

$\stackrel{\text{Posterior}}{P(\theta \vert data)}=\frac{\stackrel{\text{Likelihood}}{P(data \vert \theta)} \stackrel{\text{Prior}}{P(\theta)}}{\stackrel{\text{Marginal}}{P(data)} }$

All the elements in the above formula are dependent on a MODEL we choose.The model becomes important for the margin to explain what the probability of data is. It is kinda probability of the data based on the model and chosend parameters.
Likelihood - It is not a probability, it does separately not integrate to 1 - This makes intuitively sense, because all parameters together for the same data produce multiple views and these vies cumulatively dont form a full probability. However if we include the beliefs to parameters this becomes something called the full system, thus the marginal in Bayes formula should integrate to 1? Maybe not .. It actually sums up to whatever the probability of the data is considering different parameters and their probabilities. Sealt tuli mul mõte, et äkki isegi marginal integreerub üheks lähtudes, et $P(data)=\int_{-\infty}^{\infty} P(data \vert \theta)P(\theta)=1$ Aga see selleks . Bayesi statistikas on ta lõpuks ikkagi konstant. Continuing with it tomorrow. https://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions
I dont think $P(data)=\sum{P(data \vert \theta)P(\theta)}$ will be larger than 1. It will still be probability. It wont=1 either this was my mixup point

CONJUGATE PRIOR AND POSTERITOR. FUNCTION FAMILYS, <- EXPONENTIAL FAMILY

Conjugacy - also video from oxbridge.

BETA DISTRIBUTION - DIRICHLET DISTRIBUTION

RECAP OF /Bayes Materials Bayes Method/Bayesgothrouhgh.pdf

Here we use an example of 2 URNS with red and green balls: A & B. 1/3 of balls in A are red and 2/3 of balls in B are red.
We chose a random urn not knowing which it is. Now we want to take a sample of size N(the data, which we will call X) from the urn get a certain belief about which urn we took. E.g. we want to get $P(A \vert X),P(B \vert X)=P(\neg A \vert X)$ .

Initially our beliefs (Prior model) of taking urn A or taking urn B are both 1/2:

$P(A)=P(B)=\frac{1}{2}$ .

Lets calculate the likelihood of getting X based on $A$ and $\neg A$ . Lets say we took k red balls in the sample of size n. Then likeihoods for for $A$ adn $\neg A$ will be:

$P(X \vert A)=(\frac{1}{3})^{k}(\frac{2}{3})^{n-k}$
$P(X \vert \neg A)=(\frac{2}{3})^{k}(\frac{1}{3})^{n-k}$

The formula for likelihood - the MODEL mentioned previously comes from Bernoulli distribution $B(p)$ where p is the probability. In this case p is considered the parameter of the bernoulli distribution. We have 2 p-s. One p=1/3 probability of getting red ball from A and p=2/3 probability of getting red ball from B. p is considered the parameter in the bayesian inference here.

The overall probability for getting the data according to Bernoulli model(margin) is just goint to be a sum over all the parameters $p$ .

$P(X \vert A)=(\frac{1}{3})^{k}(\frac{2}{3})^{n-k} \frac{1}{2}$ + $P(X \vert \neg A)=(\frac{2}{3})^{k}(\frac{1}{3})^{n-k} \frac{1}{2}$

Now we have all the variables for using the Bayesian formula to get the Posterior model $P(A \vert X)$ - (Our belief into all the parameters after seeing the data)

Reminder of the general shape of the formula for bayesian inference

$\stackrel{\text{Posterior}}{P(\theta \vert data)}=\frac{\stackrel{\text{Likelihood}}{P(data \vert \theta)} \stackrel{\text{Prior}}{P(\theta)}}{\stackrel{\text{Marginal}}{P(data)} }$

Calculating the posterior

$P(A \vert X)=\frac{(\frac{1}{3})^{k}(\frac{2}{3})^{n-k}\frac{1}{2}}{(\frac{1}{3})^{k}(\frac{2}{3})^{n-k}\frac{1}{2} + (\frac{2}{3})^{k}(\frac{1}{3})^{n-k}\frac{1}{2}}$
$P(B \vert X)=\frac{(\frac{2}{3})^{k}(\frac{1}{3})^{n-k}\frac{1}{2}}{(\frac{1}{3})^{k}(\frac{2}{3})^{n-k}\frac{1}{2} + (\frac{2}{3})^{k}(\frac{1}{3})^{n-k}\frac{1}{2}}$

For example, suppose that in n = 10 trials we got k = 4 red balls. The posterior probabilities would become $P(A \vert X)=\frac{4}{5}$ and $P(B \vert X) = \frac{1}{5}$

Generalisation of the previous

What Bayesian statistics does is replace this concept of likelihood by a real probability. In order to do that, we’ll treat the parameter $\theta$ as a random variable rather than an unknown constant.This random variable $\theta$ itself has a probability mass function which integrates to 1: $f(\theta)=P(\theta)$ This $f(\theta)$ is called the prior distribution on $\theta$ .

BETA DISTRIBUTION

Here I did a short detour to youtube

Likelihood not probability Does not in itself integrate to 1 if we vary the theta. There is no rewquirement.
Order invariance

Conjugate Prior Definition CONJUGACY: If we choose prior and likelihood with some special distribution, it will lead to to a certain distribution.
Conjugate prior Beta distribution
Proof Beta distribution conjugatu bernoulli,binomial

Beta Distribution

$P(\theta \vert a,b)=\frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}$

IMPORTANT EXAMPLE OF APPLICATION OF BAYESIAN INFERENCE AND BETA’

Basically Beta distribution allows us to perform posterior inference. Thamt means get posterior DISTRIBUTION from prior and the data through a simple modification of the formula. Example of this:

Data (X) has binomial distribution. We performed N=10 experiments, and X=1
$\theta$ has beta distribution
Initial Beta distribution is $B(1,1)$ After inference its going to be with distribution $B(1+1,1+10-1)$

Additional info

Prior gets more important as the size on N increases.
The bigger the size of experiment x, the larger the effect on it on the prior.
The posterior is somewhat a mean between the prior distribution and the data distribution. Both priors and data probabilities effect depends on the size of the data they are based.
This videolist also explains, normal distribution, poisson distribution and gamma distribution.

Share on

Twitter Facebook Google+ LinkedIn

Andres Namm

Bayes background - (messy - made for my own recap)

LDA

BAYES STATISTICS

BAYES FORMULA

BAYES FORMULA IN BAYESIAN INFERENCE

CONJUGATE PRIOR AND POSTERITOR. FUNCTION FAMILYS, <- EXPONENTIAL FAMILY

BETA DISTRIBUTION - DIRICHLET DISTRIBUTION

RECAP OF /Bayes Materials Bayes Method/Bayesgothrouhgh.pdf

Generalisation of the previous

Share on

Leave a Comment

You May Also Enjoy

Report for thesis defence

Poster describing the thesis project

Experiment Results Summary

Intermediate Report for September