Introduction

This is the third blog post in the series to understand and visualize Multivariate Gaussian probability distributions.

Usually, we are given a predefined value of $mean$ $'\mu'$ and the $standard$ $deviation$ $'\sigma'$ of a Normal distribution. For a random variable $Z$ we then know the pdf of the Normal diistribution to which it belongs to as follows : $$Z \sim \mathcal{N}(0,1)$$ $$f(z) = \frac1{\sqrt{2\pi\sigma^2}}{exp}(^\frac{Z-\mu}{\sigma})$$ We can simply substitute various values of Z to obtain the single valued output of the pdf

Sampling

The process of sampling for a univariate Normal distribution must be clearly understood in order to draw parallels with it when multi variate distributions are discussed. Here, the mean and the variance of the normal distribution are single numbers, which will be replaced by vectorsa and square matrices respectively while dealing with mutli variate Normal distribution.

We will assume that we have a gaussian distribution that we need to sample from. Now we just have the mean and the variance and we will PRODUCE data.

For us to do so, we will assume that we have a mechanism that produces uniform samples between 0 and 1 of EQUAL probability. All computer programming languages have this in buit psudo random number generator.

We will also assume that we can compute the CUMULATIVE of the Gaussian. Cumulative is what we get if we start suming up the area below the Gaussian from the left as shown in the plot. The inflexion point is at the mean. The area to the left is equal to the area in the right. The asymptote of the curve is 1 since the area under the pdf is always 1.

Now we draw a random number from the uniform distribution between 0 and 1. We project this number horizontally on the graph of the cdf and from there we project the point vertically down on the X- axis to retrieve a SAMPLE. This is how a computer samples. This process is called the Inverse Cumulative mapping. If we do this often we will get more samples around the mean.

Specifically the computer generates samples from ZERO mean and UNIT variance. $Z \sim \mathcal{N}(0,1)$

# defining the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

def univariate_normal(x, mean, variance):
    """pdf of the univariate normal distribution."""
    return ((1. / np.sqrt(2 * np.pi * variance)) * 
            np.exp(-(x - mean)**2 / (2 * variance)))

# No of Data points
N = 1000  
# initializing random values
data = np.random.randn(N)  
# getting data of the histogram
count, bins_count = np.histogram(data, bins=18)  
# finding the PDF of the histogram using count values
pdf = count / sum(count)  
# using numpy np.cumsum to calculate the CDF
cdf = np.cumsum(pdf)  

fig, ax = plt.subplots()
# plotting PDF and CDF
plt.subplot(221)
#plt.plot(bins_count[1:], pdf, color="red", label="PDF")
plt.plot(bins_count[1:], cdf, label="CDF")
plt.axhline(y = 0.8,xmin =0.62,xmax=0.95,color = 'g', linestyle = '--')
plt.axvline(x = 0.8,ymin =0 ,ymax=0.78,color = 'g', linestyle = '--')

plt.legend()

plt.subplot(222)
plt.xlim(-2,2)
plt.ylim(0,1)
plt.axhline(y=0.8, xmin = 0, xmax=0.75,color ='g',linestyle='--')
plt.axvline(x=0,color ='k',linestyle='-')
plt.axvline(x = 1,color = 'g', linestyle = '-')

plt.subplot(223)
Z = np.linspace(-np.pi, np.pi,300)
mean, variance = 0 ,1
plt.plot(Z , univariate_normal(Z,mean,variance))
plt.axvline( x=0.8, ymin=0, ymax=0.7,color = 'g', linestyle = '--')
circle1 = plt.Circle((0.8, -0.04), 0.05, color='r')
ax.add_patch(circle1)
<matplotlib.patches.Circle at 0x7f8d3b49ec10>

Essential Statisitcs of Univariate Normal

Note : this section can be skipped without any loss in continuity. We will now refresh some basic concepts in Probability. We will derive the equation for the MEAN and the VARIANCE of te univariate normal distribution. $$x \sim \mathcal{N}(0,1)$$ $$f(x) = \frac1{\sqrt{2\pi\sigma^2}}{exp}(^\frac{x-\mu}{\sigma})$$

Area below the pdf is always

$$ \int_{-\infty}^{\infty}f(x) \,dx = 1 $$

Expectation of the Normal r.v is equal to the mean

$$ \mathbb{E}(X)= \mu $$

Variance of the Normal r.v.

refer this link for the proof $$Var(X) = \sigma^2$$

Translation ( Adding a consant to the Normal distribution )

fig, ax = plt.subplots()
Z = np.linspace(-2*np.pi, 3* np.pi,300)
mean, variance = 0 ,1
k=2
plt.plot(Z , univariate_normal(Z,mean,variance))
plt.axvline( x=variance, ymin=0, ymax=0.6,color = 'g', linestyle = '--')
plt.axvline( x=-variance, ymin=0, ymax=0.6,color = 'g', linestyle = '--')
plt.plot(Z , univariate_normal(Z,mean+k,variance))
plt.axvline( x=variance+k, ymin=0, ymax=0.5,color = 'y', linestyle = '--')
plt.axhline(y=0.4, xmin = 0.4, xmax=0.52,color ='k',linestyle='-')
#plt.axvline( x=-variance-k, ymin=0, ymax=0.6,color = 'y', linestyle = '--')
plt.xlim(-2*np.pi, 3*np.pi)
plt.title("Translation by k  \n$\mu_x = \mu_z + k$ \n $\sigma_x = \sigma_z$")
Text(0.5, 1.0, 'Translation by k  \n$\\mu_x = \\mu_z + k$ \n $\\sigma_x = \\sigma_z$')

Scaling of a Normal Distribution

Now consider the original r.v $x$ sampled from a normal distributiona again and let $k$ and $b$ be constant numbers such that $Z = k \;X+\;b$. In generral, if $X \sim \mathcal{N}(\mu_x,\sigma_x^2)$ and if $Y = k *X + b$, then we can prove that $Y \sim \mathcal{N}(\mu_y,\sigma_y^2)$ where $\mu_y = k* \mu_x + b$ and $\sigma_y^2 = k^2 \sigma_x^2$

Proof:

$$Z \sim \mathcal{N}(0,1)$$

original r.v. from pseudo number generator. $\mu_z = 0$ and $\sigma_z = 1$ $$X = \sim \mathcal{N}(\mu_x,\sigma_x^2)$$ $$ \therefore X = \sigma_x\;Z + \mu_x$$ Thus $$Y = k\;X + b$$ $$Y = k\; (\sigma_x\;Z + \mu_x) + b$$ $$Y = (a\;\sigma_x)Z\;+\;(a\mu_x\;+\;b)$$

where, $$ \mu_y = a\mu_x\;+\;b $$ and $$Var(Y) = a^2\;\sigma_x^2 $$

Why standard normal samples multiplied by sd are samples from a normal dist with that sd ?