Data Science Texts

Discover what you don't know, and attack your weaknesses!

Principal Component Analysis

Strongly Recommended Prerequisites

Recommended Prerequisites

Principal component analysis (PCA) is a multivariate technique designed to to reduce high-dimensional problems to a lower-dimensional problems. The basic idea is that only axes along which data points have high variance are considered, and the others are discarded. Frankly, if you don't already know what principal component analysis is, you probably don't need a dedicated book. However, if you use or are planning to use principal component analysis, there are many subtle issues that can arise. You will save a lot of time by becoming familiar with those issues from a book rather than dealing (or failing to deal) with them yourself.

Recommended Books

  1. Principal Component Analysis

    I.T. Jolliffe

    (Image takes you to Amazon.)

    Key Topics

    • Biplots
    • Canonical Correlation Analysis
    • Factor Analysis
    • Functional PCA
    • Generalizations and Adaptations of Principal Component Analysis
    • Geometry of Principal Components
    • Interpretation of Principal Components
    • Interpreting Principal Components
    • Outlier Detection
    • Population Principal Components
    • Principal Component Analysis for Time Series
    • Principal Component Analysis for non-Gaussian Data
    • Principal Component Regression
    • Principal Component Subset Selection
    • Principal Components Used with Other Multivariate Techniques
    • Projection Pursuit
    • Robust Estimation
    • Rotation of Principal Components
    • Sample Principal Components

    Description

    We're not exaggerating when we say this is one of the best applied statistics books. There are a surprisingly large number of good books on principal component analysis given that it is a somewhat niche technique, but unfortunately for them they must all compete with this book. The mathematical introduction is succinct and precise, the coverage of practical issues is exhaustive in our experience (we found it somewhat frustrating to find our own supposed insights already in print!), and there many useful extensions of principal component analysis presented. The major drawbacks of this book are that there are no code examples or exercises. It's pretty easy to follow along though as PCA packages are common. You will just need some appropriate data (any tabular data with continuous columns will work for most of the book).