Experiments are a very important component of data science; they're the only reliable way to to measure the change in one variable caused by another. Since it is often expensive to perform experiments, it is important to conduct them in an optimal way. There is no single design that is optimal for every scenario, so data scientists must be familiar with the specific designs that work well in the kinds of situations they generally encounter.
Recommended Books
Design and Analysis of Experiments with R
John Lawson
Key Features
- In-text exercises
- Example R code
Key Topics
- Completely Randomized Design
- Crossover and Repeated Measure Designs
- Designs to Study Variances
- Experimental Strategies for Increasing Knowledge
- Factorial Designs
- Fractional Factorial Designs
- Incomplete and Confounded Block Designs
- Linear Models
- Mixture Experiments
- Nested Designs
- Randomization
- Randomized Block Designs
- Replication
- Response Surface Designs
- Robust Parameter Design Experiments
- Split Plot Designs
Description
They say,
Don't judge a book by its cover,
but this book is almost worth buying for its cover alone. Lawson systematically covers the most common experimental situations and teaches you when each is appropriate. In many cases the basic designs that are introduced early in the text will work well. However, if you find yourself in a situation where your experimental units are highly heterogeneous with multiple factors that are hard to vary, you'll be glad you have this book to teach you the more obscure but more performant design. Lawson provides example code that takes some of the guesswork out of working with arcane R packages.Statistics for Experimenters: Design, Innovation, and Discovery
George E.p. Box, J. Stuart Hunter, William G. Hunter
Key Features
- In-text exercises
- Solutions to some exercises
Key Topics
- Blocking and Randomization
- Data Transformation
- Designing Robust Products
- Elementary Probability and Statistics
- Evolutionary Process Operation
- Factorial Designs
- Fractional Factorial Designs
- Latin Squares Design
- Linear Models
- Process Control, Forecasting, and Time Series
- Randomized Block Designs
- Response Surface Methods
- Split Plot Design
Description
This is the OG text on experimental design, and any data scientist who does a lot of experimentation will benefit from reading through it. We feel it's not the best possible experimental design book because it doesn't work as well as a reference as our top pick, and it tries to be too many things: an introductory statistics book, an experimental design book, an operations research book.... That said, it does have a lot of wisdom to offer on those subjects, so if you're interested in them this book will serve you well.