idlastro / Math and Statistics: PCA

[Source code]

NAME
PCA
PURPOSE
Carry out a Principal Components Analysis (Karhunen-Loeve Transform)
EXPLANATION
Results can be directed to the screen, a file, or output variables
See notes below for comparison with the intrinsic IDL function PCOMP.
CALLING SEQUENCE
PCA, data, eigenval, eigenvect, percentages, proj_obj, proj_atr, 
         [MATRIX =, TEXTOUT = ,/COVARIANCE, /SSQ, /SILENT ]
INPUT PARAMETERS
data -  2-d data matrix, data(i,j) contains the jth attribute value
          for the ith object in the sample.    If N_OBJ is the total
          number of objects (rows) in the sample, and N_ATTRIB is the 
          total number of attributes (columns) then data should be
          dimensioned N_OBJ x N_ATTRIB.         
OPTIONAL INPUT KEYWORD PARAMETERS
/COVARIANCE - if this keyword is set, then the PCA will be carried out
         on the covariance matrix (rare), the default is to use the
         correlation matrix
/SILENT - If this keyword is set, then no output is printed
/SSQ - if this keyword is set, then the PCA will be carried out on
          on the sums-of-squares & cross-products matrix (rare)
TEXTOUT - Controls print output device, defaults to !TEXTOUT
         textout=1       TERMINAL using /more option
         textout=2       TERMINAL without /more option
         textout=3       .prt
         textout=4       laser.tmp
         textout=5      user must open file
         textout = filename (default extension of .prt)
OPTIONAL OUTPUT PARAMETERS
eigenval -  N_ATTRIB element vector containing the sorted eigenvalues
eigenvect - N_ATRRIB x N_ATTRIB matrix containing the corresponding 
          eigenvectors
percentages - N_ATTRIB element containing the cumulative percentage 
        variances associated with the principal components
proj_obj - N_OBJ by N_ATTRIB matrix containing the projections of the 
        objects on the principal components
proj_atr - N_ATTRIB by N_ATTRIB matrix containing the projections of 
          the attributes on the principal components
OPTIONAL OUTPUT PARAMETER
MATRIX   = analysed matrix, either the covariance matrix if /COVARIANCE
        is set, the "sum of squares and cross-products" matrix if
        /SSQ is set, or the (by default) correlation matrix.    Matrix
        will have dimensions N_ATTRIB x N_ATTRIB
NOTES
This procedure performs Principal Components Analysis (Karhunen-Loeve
Transform) according to the method described in "Multivariate Data 
Analysis" by Murtagh & Heck [Reidel : Dordrecht 1987], pp. 33-48.
See  http://www.classification-society.org/csna/mda-sw/pca.f
Keywords /COVARIANCE and /SSQ are mutually exclusive.
The printout contains only (at most) the first seven principle 
eigenvectors.    However, the output variables EIGENVECT contain 
all the eigenvectors
Different authors scale the covariance matrix in different ways.
The eigenvalues output by PCA may have to be scaled by 1/N_OBJ or
1/(N_OBJ-1) to agree with other calculations when /COVAR is set.
PCA uses the non-standard system variables !TEXTOUT and !TEXTUNIT.
These can be added to one's session using the procedure ASTROLIB.
The intrinsic IDL function PCOMP  duplicates most
most of the functionality of PCA, but uses different conventions and
normalizations.   Note the following:
 PCOMP requires a N_ATTRIB x N_OBJ input array; this is the transpose
   of what PCA expects
 PCA uses standardized variables for the correlation matrix:  the input 
  vectors are set to a  mean of zero and variance of one and divided by 
  sqrt(n); use the /STANDARDIZE keyword to PCOMP for a direct comparison.
 PCA (unlike PCOMP) normalizes the eigenvectors by the square root
   of the eigenvalues.
 PCA returns cumulative percentages; the VARIANCES keyword of PCOMP
   returns the variance in each variable
 PCOMP divides the eigenvalues by (1/N_OBJ-1) when the covariance matrix
    is used.
EXAMPLE
Perform a PCA analysis on the covariance matrix of a data matrix, DATA,
and write the results to a file
IDL> PCA, data, /COVAR, t = 'pca.dat'
Perform a PCA analysis on the correlation matrix.   Suppress all 
printing, and save the eigenvectors and eigenvalues in output variables
IDL> PCA, data, eigenval, eigenvect, /SILENT
PROCEDURES CALLED
TEXTOPEN, TEXTCLOSE
REVISION HISTORY
Immanuel Freedman (after Murtagh F. and Heck A.).     December 1993
Wayne Landsman, modified I/O              December 1993
Fix MATRIX output, remove GOTO statements   W. Landsman August 1998      
Changed some index variable to type LONG    W. Landsman March 2000
Fix error in computation of proj_atr, see Jan 1990 fix in 
 http://astro.u-strasbg.fr/~fmurtagh/mda-sw/pca.f   W. Landsman Feb 2008
eturn to user if error
Constants
are array elements near-zero ?
Dispatch table
efine nonstandard system variables if not already present
umber of objects and attributes
form column-means
form column-means
Carry out eigenreduction
D contains diagonal, E contains off-diagonal
D contains the eigen-values, A(*,i) -vectors
Use TOLERANCE to decide if eigenquantities are sufficiently near zero
Order by increasing eigenvalue
Eigenvalues expressed as percentage variance and ...
.. Cumulative percentage variance
efine returned parameters
Output eigen-values and -vectors
Open output file 
Obtain projection of row-point on principal axes (Murtagh & Heck convention)
Use TOLERANCE again...
Obtain projection of column-points on principal axes
Use TOLERANCE again...
scale by square root of eigenvalues...
Close output file