Data Science Texts

Discover what you don't know, and attack your weaknesses!

Multivariate Analysis

Strongly Recommended Prerequisites

Recommended Prerequisites

Multivariate analysis is what people called many machine learning techniques before calling it machine learning became so lucrative. Traditional multivariate analysis emphasizes theory concerning the multivariate normal distribution, techniques based on the multivariate normal distribution, and techniques that don't require a distributional assumption, but had better work well for the multivariate normal distribution, such as: multivariate regression, classification, principal component analysis, ANOVA, ANCOVA, correspondence analysis, density estimation, etc. Modern multivariate analysis includes the powerful nonparametric regressors/classifiers such as neural networks and tree-based techniques.

Recommended Books

  1. Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning

    Alan J. Izenman

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • Artificial Neural Networks
    • Blind Source Separation
    • Boosting
    • Canonical Correlation Analysis
    • Classification and Regression Trees
    • Clustering
    • Correspondence Analysis
    • Cross-Validation
    • Data Quality Problems
    • Databases
    • Exploratory Data Analysis
    • Hierarchical Clustering
    • Histograms
    • Independent Component Analysis
    • Kernel Density Estimation
    • Kernel PCA
    • Linear Discriminant Analysis
    • Linear Regression
    • Manifold Learning
    • Multidimensional Scaling
    • Multilayer Perceptron
    • Multivariate Gaussian Distribution
    • Multivariate Regression
    • Nonparametric Density Estimation
    • Principal Component Analysis
    • Random Forests
    • Regularized Regression
    • Self-Organizing Maps
    • Singular-Value Decomposition
    • Support Vector Machines
    • The Curse of Dimensionality
    • Variable Selection
    • Vectors and Matrices

    Description

    This book tries to cover a lot of ground. The subtitle Regression, Classification, and Manifold Learning spells out the foci of the book (hypothesis testing is rather neglected). Izenman covers the classical techniques for these three tasks, such as multivariate regression, discriminant analysis, and principal component analysis, as well as many modern techniques, such as artificial neural networks, gradient boosting, and self-organizing maps. Obviously he cannot describe each topic in exhaustive detail, but he delivers the main applied points, and he'll get you interested enough to look for resources dedicated to each topic.

  2. Using Multivariate Statistics

    Barbara G. Tabachnick and Linda S. Fidell

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • ARIMA Models
    • Analysis of Covariance (ANCOVA)
    • Analysis of Variance (ANOVA)
    • Canonical Correlation Analysis
    • Discriminant Analysis
    • Factor Analysis
    • Generalized Linear Models
    • Logistic Regression
    • Missing Data
    • Multilevel Linear Modeling
    • Multiple Regression
    • Multiway Frequency Analysis
    • Outliers
    • Principal Component Analysis
    • Profile Analysis
    • Repeated Measures
    • Screening Data
    • Structural Equation Modeling
    • Survival Analysis
    • Time Series Analysis

    Description

    This is an outstanding practitioner's guide to classical multivariate analysis. Each technique gets a standalone chapter organized into: the sort of questions the technique can answer, the technique's limitations, the fundamental equations involved in using the technique, common issues, and fleshed-out examples that use the technique. There are two infuriating deficiencies, however. There are no exercises, and the code used is SAS or SPSS instead of something free and modern. In a certain respect, the issues cancel out, since reimplementing the examples in a proper language is a critical exercise.