Bayes background - (messy - made for my own recap)
LDA
BAYES STATISTICS
LDA is a unsupervised learning method which originates from Bayes Statitistics.
BAYES FORMULA
General Shape of bayes formula in probability Theory
Thorough tutorial into Bayes rule in Probability Theory - includes likelihood ratio
An example of this would be to replace variables like this:
- A - Is sick
- B - Test shows sick.
- Then we would be asking the probability of somebody being sick if the test is showing sickness. .
BAYES FORMULA IN BAYESIAN INFERENCE
In bayes statistics we suppose that there is a distribution over . In bayesian statistic varies. We assume that the data is fixed.
In statistics, the main aim of bayesian rule is to calculate the probability of theta- parameter/s based on the given data. We often do this incrementally. E.g. we are improving/deepening our belief in certain parameters as the amount of data “goin through” our calculations grows.
- All the elements in the above formula are dependent on a MODEL we choose.The model becomes important for the margin to explain what the probability of data is. It is kinda probability of the data based on the model and chosend parameters.
-
Likelihood - It is not a probability, it does separately not integrate to 1 - This makes intuitively sense, because all parameters together for the same data produce multiple views and these vies cumulatively dont form a full probability. However if we include the beliefs to parameters this becomes something called the full system, thus the marginal in Bayes formula should integrate to 1? Maybe not .. It actually sums up to whatever the probability of the data is considering different parameters and their probabilities.
Sealt tuli mul mõte, et äkki isegi marginal integreerub üheks lähtudes, et Aga see selleks . Bayesi statistikas on ta lõpuks ikkagi konstant. Continuing with it tomorrow.https://stats.stackexchange.com/questions/58564/help-me-understand-bayesian-prior-and-posterior-distributions - I dont think will be larger than 1. It will still be probability. It wont=1 either this was my mixup point
CONJUGATE PRIOR AND POSTERITOR. FUNCTION FAMILYS, <- EXPONENTIAL FAMILY
- Conjugacy - also video from oxbridge.
BETA DISTRIBUTION - DIRICHLET DISTRIBUTION
RECAP OF /Bayes Materials Bayes Method/Bayesgothrouhgh.pdf
- Here we use an example of 2 URNS with red and green balls: A & B. 1/3 of balls in A are red and 2/3 of balls in B are red.
- We chose a random urn not knowing which it is. Now we want to take a sample of size N(the data, which we will call X) from the urn get a certain belief about which urn we took. E.g. we want to get .
Initially our beliefs (Prior model) of taking urn A or taking urn B are both 1/2:
- .
Lets calculate the likelihood of getting X based on and . Lets say we took k red balls in the sample of size n. Then likeihoods for for adn will be:
The formula for likelihood - the MODEL mentioned previously comes from Bernoulli distribution where p is the probability. In this case p is considered the parameter of the bernoulli distribution. We have 2 p-s. One p=1/3 probability of getting red ball from A and p=2/3 probability of getting red ball from B. p is considered the parameter in the bayesian inference here.
The overall probability for getting the data according to Bernoulli model(margin) is just goint to be a sum over all the parameters .
+
Now we have all the variables for using the Bayesian formula to get the Posterior model - (Our belief into all the parameters after seeing the data)
Reminder of the general shape of the formula for bayesian inference
Calculating the posterior
For example, suppose that in n = 10 trials we got k = 4 red balls. The posterior probabilities would become and
Generalisation of the previous
What Bayesian statistics does is replace this concept of likelihood by a real probability. In order to do that, we’ll treat the parameter as a random variable rather than an unknown constant.This random variable itself has a probability mass function which integrates to 1: This is called the prior distribution on .
BETA DISTRIBUTION
Here I did a short detour to youtube
-
Likelihood not probability Does not in itself integrate to 1 if we vary the theta. There is no rewquirement.

- Conjugate Prior Definition CONJUGACY: If we choose prior and likelihood with some special distribution, it will lead to to a certain distribution.
- Proof Beta distribution conjugatu bernoulli,binomial
Beta Distribution
Basically Beta distribution allows us to perform posterior inference. Thamt means get posterior DISTRIBUTION from prior and the data through a simple modification of the formula. Example of this:
- Data (X) has binomial distribution. We performed N=10 experiments, and X=1
- has beta distribution
- Initial Beta distribution is After inference its going to be with distribution

Additional info
- Prior gets more important as the size on N increases.
- The bigger the size of experiment x, the larger the effect on it on the prior.
- The posterior is somewhat a mean between the prior distribution and the data distribution. Both priors and data probabilities effect depends on the size of the data they are based.
- This videolist also explains, normal distribution, poisson distribution and gamma distribution.
Leave a Comment