Data Science Texts

Discover what you don't know, and attack your weaknesses!

Elementary Probability & Statistics

Strongly Recommended Prerequisites

Recommended Prerequisites

Last Updated: 7/18/2019

Elementary probability and statistics books tend to follow the same pattern: they introduce probability, which is the mathematics that describes uncertain processes, and then they spend the remaining majority of pages talking about statistics, which is a collection of techniques to determine what probabilistic process is generating real data. Once the probabilistic process is determined, then questions about future or otherwise unknown outcomes can be answered. Most books in this category contain roughly the same probability material; the differences are in the statistics sections.

Recommended Books

  1. Probability and Statistics

    Morris H. Degroot And Mark J. Schervish

    Book image of Probability and Statistics.
    Check it out on Amazon!

    Key Features

    • In-text exercises
    • Answers to odd-numbered exercises
    • Solution manual available
    • Errata

    Key Topics

    • Categorical Data Analysis
    • Central Limit Theorem
    • Conditional Probability
    • Confidence Intervals
    • Estimation
    • Expectation
    • Hypothesis Testing
    • Law of Large Numbers
    • Linear Models
    • Maximum Likelihood Estimation
    • Probability
    • Probability Distributions
    • Random Variables
    • Simulation
    • Smattering of Bayesian Methods

    Description

    Frequently imitated but never duplicated, this is a canonical text in statistics. It contains all the information that would be expected from an introductory course, and it is lucidly written. However, DeGroot and Schervish does have some downsides. There is little guidance to actually performing the computations the book describes on an actual computer. The Bayesian perspective also receives less emphasis than it should. If you're going to read more books on statistics, this is a great choice. However, if you're only going to read one book you might want one that focuses more on applications.

  2. Mathematical Statistics with Resampling and R

    Laura M. Chihara And Tim C. Hesterberg

    Book image of Mathematical Statistics with Resampling and R.
    Check it out on Amazon!

    Key Features

    • In-text exercises
    • Solutions to some exercises
    • Computer code provided
    • Errata

    Key Topics

    • ANOVA
    • Bootstrap Confidence Intervals
    • Categorical Data Analysis
    • Central Limit Theorem
    • Estimation
    • Exploratory Data Analysis
    • Hypothesis Testing
    • Linear Models
    • Maximum Likelihood Estimation
    • Permutation Tests
    • Probability Review
    • Smattering of Bayesian Methods
    • The Bootstrap

    Description

    This book is a departure from the usual introductory probability and statistics formula. There is no introductory probability section (there is a rather terse review in the appendix), but the text itself contains both a lot of applications and example code. Useful methods, such as the bootstrap and permutation testing, that would not normally be covered in an introductory course are covered in this book. However, some traditional theory topics have been dropped. This book is a great option if you don't plan to read a lot more statistics books, but it must be accompanied by a probability text.

  3. Introduction to Probability Models

    Sheldon M. Ross

    Check it out on Amazon!

    Key Features

    • In-text exercises
    • Solutions to some exercises

    Key Topics

    • Brownian Motion
    • Conditional Probability
    • Events
    • Expectation
    • Markov Chains
    • Poisson Processes
    • Queuing Theory
    • Random Variables
    • Reliability Theory
    • Renewal Theory
    • Simulation

    Description

    If you just want a probability book, rather than a statistics book, this is the one to get. It contains material far beyond what most would consider introductory from a data science perspective, but the first few chapters will give you the foundation you need for general data science. The rest of the book contains material that is useful for more advanced statistical topics or if you are particularly interested in probability. This is a great book to pair with Mathematical Statistics with Resampling.