<<Up     Contents

Normal distribution

Redirected from Bell curve

The normal or Gaussian distribution is a ubiquitous and extremely important probability distribution considered in statistics. It is actually a family of distributions of the same general form, differing only in their location and scale parameters: the mean and standard deviation. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one.

Table of contents

Probability density function

The probability density function of the normal distribution with mean μ and standard deviation σ (or variance σ2) is also known as the Gaussian function

<math>f(x) = {1 \over \sigma\sqrt{2\pi} }\,e^{-{(x-\mu )^2 \over 2\sigma^2}}</math>
(see exponential function and pi). If a random variable X follows this distribution, we write X ~ N(μ, σ2). If μ = 0 and σ = 1, we talk about the standard normal distribution, with formula

<math>f(x) = {1 \over \sqrt{2\pi} }\,e^{-{x^2 \over 2}}</math>

Gaussian.png

This picture is the graph of the probability density function of the standard normal distribution. The distribution is symmetric about its mean value and its shape resembles a bell, which has led to it being called the bell curve. About 68% of the area under the curve is within one standard deviation of the mean, 95.5% within two standard deviations, and 99.7% within three standard deviations (the "68 - 95.5 - 99.7 rule"). The inflection points[?] of the curve occur at one standard deviation away from the mean.

These statements are also true for non-standard normal distributions.

Standardizing Gaussian random variables

If X is a Gaussian random variable with mean μ and variance σ2, then

<math> Z = \frac{X - \mu}{\sigma} </math>

is a standard normal random variable: Z~N(0,1). Conversely, if Z is a standard normal random variable,

<math>X=\sigma Z+\mu</math>

is a Gaussian random variable with mean μ and variance σ2.

The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, if one knows the mean and the standard deviation of a normal distribution, one can use this table to answer all questions about the distribution.

Occurrence

Approximately normal distributions occur in many situations, as a result of the central limit theorem. Simply stated, this theorem says that adding up a large number of small independent variables results in an approximately normal distribution. Therefore, whenever there is reason to suspect the presence of a large number of small effects acting additively, it is reasonable to assume that observations will be normal. The IQ score of an individual for example can be seen as the result of many small additive influences: many genes and many environmental factors all play a role.

It is important to realize, however, that small effects often act as multiplicative (rather than additive) increases. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal. Good examples of this behaviour are financial indicators such as interest rates or stock values. Also, in biology it has been observed that organism growth sometimes proceeds by multiplicative rather than additive increments, implying that the distribution of body sizes should be log-normal.

Other examples of variables that are not normally distributed:

Further properties

If X ~ N(μ, σ2) and a and b are real numbers, then aX + b ~ N(aμ + b, (aσ)2).

If X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22), and X1 and X2 are independent, then X1 + X2 ~ N(μ1 + μ2, σ12 + σ22).

If X1, ..., Xn are independent standard normal variables, then X12 + ... + Xn2 follows a chi-squared distribution with n degrees of freedom.

Characteristic function

The characteristic function of a gaussian random variable X ~ N(μ,σ2) is defined as the expected value of eitX and can be written as

<math>\phi_X(t)=E\left[e^{itX}\right]=\int_{-\infty}^{\infty} {1 \over \sigma\sqrt{2\pi} }\,e^{-{(x-\mu )^2 \over 2\sigma^2}}\,e^{itx}\,dx = e^{i\mu t-\sigma^2 t^2/2}</math>

as can be seen by completing the square in the exponent.

Generating Gaussian random variables

For computer simulations, it is often necessary to generate values that follow a Gaussian distribution. This is best done with the Box-Muller transforms. These methods require two uniformly distributed values as input which can easily be generated by the computer's pseudorandom number generator.

History

The normal distribution was first introduced by de Moivre in an article in 1733 (reprinted in the second edition of his Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities[?] (1812), and is now called the Theorem of de Moivre-Laplace[?].

Laplace used the normal distribution in the analysis of errors[?] of experiments. The important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it in 1809 by assuming a normal distribution of the errors.

The name "bell curve" goes back to Jouffret[?] who used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis[?] around 1875 [Stigler]. This terminology is unfortunate, since it reflects and encourages the fallacy that "everything is Gaussian".

Cumulative distribution function of the normal distribution

The following graph shows the probabilities that a given standard normal variable has a value less than z, for values of z from -4 to +4. This is known as the cumulative distribution function of the normal distribution, and has formula

<math>\Phi(z) = \int_{-\infty}^z {1 \over \sqrt{2\pi} }\,e^{-{x^2 \over 2}}\,dx</math>

Cumulative normal distribution.png

So for instance, the probability that a standard normal variable has a value less than 0.12 is equal to 0.54776. The cumulative distribution function of the normal distribution does not have an analytic form, and has to be calculated using numerical techniques. It is so commonly used that it is often called "the" error function.

External links and references

wikipedia.org dumped 2003-03-17 with terodump