CS Wiki | Cedric Schwyter

Probability Theory

Probability Theory

Definitions and Basic Terms

πŸ’‘ A discrete probability space is defined by the sample space/possibility space of elementary events. An (elementary-) probability is assigned to every elementary event , where we require and

A set is called an event. The probability of an event is defined by

If is an event, then we define as the complementary event of .

πŸ“Œ For events the following hold:

  1. If , then it follows that

πŸ“– (Addition theorem).

If the events are pairwise disjoint (i.e., if it holds for all pairs that ), then it holds that

For an infinite set of disjoint events it holds analogously

πŸ“– (The sieve formula, Inclusion-Exclusion principle).

For events the following holds:

πŸ“Ž (Boole’s Inequality, Union Bound).

For events the following holds:

Analogously, the following holds for an infinite sequence of events :

πŸ’‘ (Principle of Laplace).

If nothings indicates otherwise, we assume that all elementary events are equally likely.

Therefore, for all elementary events .

It immediately follows for an arbitrary event that

We say, the event of the modeled experiment on is uniformly distributed or equally distributed.

πŸ’‘ In an information-theoretical sense such a probability space ( for all ) has the largest-possible entropy. Every deviation from uniform probability distribution requires that we put more information into the model (and therefore decrease entropy).

Conditional Probability

πŸ’‘ Let and be events with . The conditional probability of given is defined by

πŸ’‘ The conditional probabilities of the form form a new probability space over for an arbitrary event with .

The probabilities of the elementary events are calculated through . Then

the definition of a discrete probability space is still fulfilled and therefore all rules for probabilities still apply to conditional probabilities.

πŸ“– (Multiplication theorem).

Let the events be given. If then the following holds:

πŸ“– (Law of total probability).

Let the events be pairwise disjoint and let Then it follows that

Analogously, for pairwise disjoint events with it follows that

πŸ“– (Bayes’ theorem).

Let the events be pairwise disjoint. Furthermore, let be an event with . Then it holds for an arbitrary that

Analogously, for pairwise disjoint events with it holds that

Independence

πŸ’‘ The events and are called independent if

πŸ’‘ The events are called independent if it holds for all subsets with that

An infinite family of events with is called independent if the above condition is met for all finite subsets .

πŸ“Œ The events are independent if and only if it holds for all that

where and .

πŸ“Œ Let , and be independent events. Then and respectively and independent.

Random Variables

πŸ’‘ A random variable is a transformation , where is the possibility space of a probability space.

πŸ’‘ In discrete probability spaces the codomain of a random variable

is in all cases finite or countably infinite, depending on being finite or countably infinite.

πŸ’‘ When researching a random variable one is interested in the probabilities with which assumes a specific value. For respectively for an arbitrary respectively we regard the event . We abbreviate as .

Therefore, we can define

πŸ’‘ The function

is called the density (function) of .

πŸ’‘ The function

is called the distribution (function) of .

Expected Value

πŸ’‘ For a random variable we define the expected value as

if that sum converges absolutely. Otherwise the expected value is said to be undefined.

πŸ“Œ If is a random variable then the following holds:

πŸ“– Let a random variable with . Then it holds that

πŸ’‘ Conditional Random Variables

Let be a random variable and an event with . By we denote that we calculate probabilities with which the random variable assumes specific values with respect to the on conditional probabilities. It thus holds that

πŸ“– Let be a random variable. For pairwise disjoint events with and it holds that

For pairwise disjoint events with and it holds analogously that

πŸ’‘ Linearity of the Expected Value

Assume, we have defined random variables:

For an we thus receive real numbers . When we define a function we immediately see that the concatenation in turn is also a random variable, for it holds that:

This holds for arbitrary functions , in particular for affine linear functions:

where are arbitrary real numbers. In this case we usually denote the random variable explicitly as

πŸ“– (Linearity of the Expected Value).

For random variables and with it holds that

πŸ’‘ For an event the corresponding indicator variable is defined by:

For the expected value of it holds that: .

Variance

πŸ’‘ For a random variable with we define the variance as

The quantity is called the standard deviation of .

πŸ“– For an arbitrary random variable it holds that

πŸ“– For an arbitrary random variable and it holds that

πŸ’‘ For a random variable we call the -th moment* and the -th central moment.

The expected value is therefore identical to the first moment, and the variance identical to the second central moment.

Important Discrete Probability Distributions

πŸ’‘ Recall the probability density function and the probability distribution function :

Bernoulli Distribution

πŸ’‘ A random variable with and density

is called Bernoulli distributed. The parameter is called the probability of success of the Bernoulli distribution.

If a random variable is Bernoulli distributed with parameter , then it is denoted by

For a Bernoulli distributed random variable the following hold:

Binomial Distribution

πŸ’‘ A random variable with and density

is called binomially distributed. The parameter is called the number of trials, the parameter is called the probability of success of the binomial distribution.

If a random variable is binomially distributed with parameters and , then it is denoted by

For a binomially distributed random variable the following hold:

Geometric Distribution

πŸ’‘ A random variable with density

is called geometrically distributed. The parameter is called the probability of success of the geometric distribution.

If a random variable is geometrically distributed with parameter , then it is denoted by

For a geometrically distributed random variable the following hold:

πŸ“– If , then for all the following holds:

Waiting for the -th Success - Negative Binomial Distribution

πŸ’‘ Let be the random variable that counts how often we have to repeat an experiment with probability of success until the -th success. For , . For , is called negatively binomially distributed with order .

The density of is

Let denote the number of experiments strictly after the -st success up until (including) the -th success. Then, each of the is geometrically distributed with parameter .

If a random variable is negatively binomially distributed with parameters and , then it is denoted by

Thus, by linearity of the expected value, it holds for the expected value that

Application: Coupon-Collector Problem

Coupon-Collector Problem

Poisson Distribution

πŸ’‘ A random variable with density

is called Poisson distributed. The parameter is equal to the mean and variance of the Poisson distribution.

If a random variable is Poisson distributed with parameter , then it is denoted by

For a Poisson distributed random variable the following hold:

Poisson Distribution as Limit of Binomial Distribution

πŸ’‘ The binomial distribution converges towards the Poisson distribution for .

Multiple Random Variables

πŸ’‘ We are interested in random variables and and probabilities of the form

πŸ’‘ The function

is called joint density of the random variables and .

If the joint density is given one can extract the density of the random variables themselves using

The functions and are called marginal densities.

πŸ’‘ The function

is called joint distribution of the random variables and .

If the joint distribution is given one can extract the distribution of the random variables themselves using

The functions and are called marginal distributions.

Independence of Random Variables

πŸ’‘ Random variables are called independent, if and only if it holds for all that

πŸ“Œ For independent random variables and arbitrary sets with it holds that

πŸ“Ž For independent random variables and the set , then are also independent.

πŸ“– Let be real-values functions ( for ). If the random variables are independent then so are .

Composite Random Variables

πŸ“– For two independent random variables and let . It holds that

πŸ’‘ The expression is called convolution, analogously to the corresponding terms for power series.

Moments of Composite Random Variables

πŸ“– (Linearity of the Expected Value).

For random variables and with it holds that

πŸ“– (Multiplicativity of the Expected Value).

For independent random variables it holds that

πŸ“– For independent random variables and it holds that

Wald’s Identity

πŸ“– (Wald’s Identity).

Let and be two independent random variables, where holds for the codomain of . Furthermore, let

where are independent copies of . Then the following holds:

Estimating Probabilities

Inequalities of Markov and Chebyshev

πŸ“– (Markov’s Inequality).

Let be a random variable, that only assumes non-negative values. Then it holds for all with that

Or equivalently: .

πŸ“– (Chebyshev’s Inequality).

Let be a random variable and with . It then holds that

Or equivalently:

Chernoff’s Inequality

πŸ“– (Chernoff-Bounds).

Let be independent Bernoulli distributed random variables with and .

For

  1. for all ,
  2. for all ,
  3. for .

This project is maintained by D3PSI