Data Science Texts

Discover what you don't know, and attack your weaknesses!

Elementary Machine Learning

Strongly Recommended Prerequisites

Recommended Prerequisites

Machine learning is what you get when you fuse probabilistic intuition with a lot of computational power and a lot of data. Introductory texts tend to focus on two main branches of machine learning: supervised and unsupervised learning. Supervised learning strives to make predictions about some output given a collection of training instances of that relationship. Unsupervised learning is much less straightforward. It strives to distill the major structure of complicated data, where structure can mean many things. Introductory machine learning texts tend to be catalogs of techniques. They give you a basic introduction to a large number of methods, but they cover each one in limited depth.

Recommended Books

  1. Machine Learning: A Probabilistic Perspective

    Kevin P. Murphy

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • Bayes Nets
    • Bayesian Models
    • Bayesian Statistics
    • Boosting
    • Classification and Regression Trees
    • Clustering
    • Deep Learning
    • Expectation Maximization Algorithm
    • Frequentist Statistics
    • Gaussian Models
    • Gaussian Processes
    • Generalized Linear Models
    • Graphical Models
    • Hidden Markov Models
    • Kernels
    • Linear Regression
    • Logistic Regression
    • Markov Chain Monte Carlo
    • Markov Models
    • Markov Random Fields
    • Mixture Models
    • Probability
    • Sparse Linear Models
    • State Space Models
    • Supervised Learning
    • Support Vector Machines
    • Unsupervised Learning
    • Variational Inference

    Description

    We're not sure how one person managed to write this book. It covers a huge range of topics, from a comparison of frequentist and Bayesian statistics to deep learning. Plus, the figures are great! However, some sections could use a lot more exercises. The book is also not very coherent in places. Murphy provides the code he uses for the figures, as well as some other examples. Unfortunately, the code is written for Matlab. Despite these issues, this is still our favorite introduction to machine learning because it is such a comprehensive survey.

  2. The Elements of Statistical Learning

    Trevor Hastie, Robert Tibshirani, Jerome Friedman

    (Image takes you to Amazon.)

    Key Features

    Key Topics

    • Additive Models
    • Basis Expansion
    • Boosting
    • Boosting
    • Clustering
    • Cross Validation
    • Density Estimation
    • High-Dimensional Problems
    • Kernel Smoothing
    • Linear Classification Methods
    • Linear Discriminant Analysis
    • Linear Regression
    • Logistic Regression
    • Model Assessment and Selection
    • Nearest Neighbor Methods
    • Neural Networks
    • Random Forests
    • Shrinkage
    • Supervised Learning
    • Support Vector Machines
    • The Bootstrap
    • The Expectation Maximization Algorithm
    • Tree-Based Methods
    • Undirected Graphical Models
    • Unsupervised Learning

    Description

    This is perhaps the canonical introductory text on machine learning. It focuses on supervised learning. The sections on linear models are particularly good, which makes sense because that's what some of the authors are famous for. It's fairly mathematical, but it's not at all self-contained. Even those with a strong mathematics background are likely to struggle with some of the exercises. There is some code provided by the authors, but it's only for a handful of subjects. This is a respectable text, and there's no reason not to check it out as the authors provide a free electronic version.