Principal components analysis

In statistics, principal components analysis (PCA) is a transform[?] used for reducing dimensionality in a dataset while retaining the most important characteristics of that dataset. In signal processing it called the (discrete) Karhunen-Loève transform. It is also called the Hotelling transform.

The principal component w₁ of a dataset x can be defined as

<math>\mathbf{w}_1

  = \arg\max_{\Vert \mathbf{w} \Vert = 1} E\left\{ \left( \mathbf{w}^T \mathbf{x}\right)^2 \right\}</math>

with the first <math>k - 1</math> components, the <math>k</math>-th component can be found by subtracting the first <math>k - 1</math> principal components from x:

<math>\mathbf{\hat{x}}_{k - 1}

  = \mathbf{x} -
    \sum_{i = 1}^{k - 1}
      \mathbf{w}_i \mathbf{w}_i^T \mathbf{x}</math>

and by substituting this as the new dataset to find a principal component in:

<math>\mathbf{w}_k

  = \arg\max_{\Vert \mathbf{w} \Vert = 1} E\left\{
    \left( \mathbf{w}^T \mathbf{\hat{x}}_{k - 1}
    \right)^2 \right\}.</math>

A simpler way to calculate the components w_i uses the covariance matrix of x, the measurement vector. By finding the eigenvalues and eigenvectors of the covariance matrix, we find that the eigenvectors with the largest eigenvalues correspond to the dimensions that have the strongest correlation in the dataset. The original measurements are finally projected onto the reduced vector space.

Related (or even more similar than related?) is the calculus of empirical orthogonal functions[?] (EOF).

Another method of dimension reduction is a self-organizing map.