Data Science Texts

Discover what you don't know, and attack your weaknesses!

Scripting Languages

Strongly Recommended Prerequisites

Recommended Prerequisites

Last Updated: 7/18/2019

For the purposes of data science, a scripting language is a programming language suitable for interactive data analysis. This is in contrast to a compiled language where a program must be recompiled after changes are made. Although there are many scripting language options, the two most popular in data science are Python and R. Python is a general purpose programming language that is widely used in many fields. Base Python is actually not well-suited for data science, but there are several extremely well-written (and free) packages that make Python useful. R is a language that is widely used in the statistics community. Base R is designed with data analysis in mind, but there is also a suite of popular packages that make common data science tasks even easier. Therefore, learning both languages involves understanding the base language and becoming familiar with the data science ecosystem.

Most data scientists work primarily in one language or the other, but commonly are familiar with Python and R. Both languages have vocal proponents, but in the grand scheme of programming languages the two are actually pretty similar. We recommend starting with one but eventually learning enough of the other to be able to read it.

Recommended Books

  1. Python Crash Course

    Eric Matthes

    Check it out on Amazon!

    Key Features

    • In-text exercises
    • Guided Projects

    Key Topics

    • 'if' Statements
    • Classes
    • Data Visualization Project
    • Dictionaries
    • Exceptions
    • Files
    • Functions
    • Game Project
    • Git
    • Installing Python
    • Lists
    • Loops
    • Online Python Resources
    • Strings
    • Testing
    • Text Editors
    • Tuples
    • Variables
    • Web Application Project

    Description

    If you have not programmed before, or if you have but want a gentle and fun introduction to Python, this is a great book to get. It contains the Python basics and offers a variety of projects. It also has a lot of software engineering supplemental material, which is great if you aren't coming in with that background.

  2. Python Essential Reference

    David M. Beazley

    Book image of Python Essential Reference.
    Check it out on Amazon!

    Key Features

    • Classes
    • Debugging
    • Exceptions
    • Flow Control
    • Functions and Functional Programming
    • Garbage Collection
    • Generators
    • Input and Output
    • Modules and Packages
    • Operators and Expressions
    • Profiling and Tuning
    • String Handling
    • Syntax
    • Threads and Concurrency
    • Types and Objects

    Description

    This book isn't very beginner-friendly, but its tutorial section can serve as an introduction to Python for those with a programming background. Once you've mastered the basics, this book is an extremely effective and concise reference to the Python language. Even after using Python for years, we find ourselves referring to this text to refresh ourselves on language features that we don't use very often. The book provides all the essential details in a brief and clear fashion. The list of key topics really doesn't do the coverage of this book justice.

  3. Python Data Science Handbook

    Jake Vanderplas

    Book image of Python Data Science Handbook.
    Check it out on Amazon!

    Key Features

    Key Topics

    • IPython
    • IPython Profiling
    • Machine Learning Topics
    • Matplotlib
    • Missing Data in Pandas
    • NumPy
    • NumPy Array Indexing
    • NumPy Arrays
    • NumPy Broadcasting
    • Pandas
    • Pandas Hierarchical Indexing
    • Pandas Data Munging
    • Scikit-Learn
    • Time Series in Pandas

    Description

    The other two Python books are great for base Python. If you plan to use Python for data science, you'll want to get this book once you're moderately proficient with the base language. It will introduce you to the powerful Ipython interpreter, which will stop you from using the kind of messy code formatting required by the standard Python terminal and has other powerful capabilities besides. This book will also introduce you to the major Python data science packages: NumPy, Pandas, Matplotlib, and Scikit-Learn. The Scikit-Learn material may require pre-existing knowledge about how machine learning techniques work to be really useful.

  4. The Art of R Programming

    Norman S. Matloff

    Check it out on Amazon!

    Key Features

    • Data Frames
    • Debugging
    • Factors and Tables
    • Flow Control
    • Functions
    • Graphics
    • Input/Output
    • Installation
    • Interfacing R to Other Languages
    • Lists
    • Mathematical Functions
    • Matrices and Arrays
    • Object Oriented Programming
    • Parallel Computing Options
    • Performance Optimization
    • String Manipulation
    • Vectorization
    • Vectors

    Description

    This is a very popular book for learning R and doesn't require you to have any programming background. Since R is designed for statistics, you will jump right into doing data analysis rather than having to build a foundation in the basic language. The major drawback of this book is that it doesn't leverage the power of RStudio, which is a development tool that is great for both beginners and pros alike. It's not hard to figure out how to use RStudio with this book though.

  5. R for Data Science

    Hadley Wickham And Garrett Grolemund

    Book image of R for Data Science.
    Check it out on Amazon!

    Key Features

    Key Topics

    • Communication
    • Dates and Times
    • Exploratory Data Analysis
    • Factors
    • Functions
    • Importing Data
    • Iteration
    • Model Building
    • Pipes
    • Plotting
    • R Markdown
    • Relational Operations (joins etc.)
    • String Tools
    • Tibbles
    • Tidy Data
    • Transforming Data
    • dplyr
    • forcats
    • ggplot2
    • lubridate
    • magrittr
    • modelr
    • purrr
    • readr
    • stringr
    • tidyr

    Description

    Although base R is quite effective for data analysis, there are many packages written for R to make analysis easier, more reproducible, and more attractive. This book presents many of the most useful infrastructure packages, although it falls somewhat short on the analysis packages. Since R has a more fragmented analytical ecosystem than Python, this is somewhat forgivable. The infrastructure packages are the ones most commonly used anyway.

  6. Advanced R

    Hadley Wickham

    Book image of Advanced R.
    Check it out on Amazon!

    Key Features

    Key Topics

    • Closures
    • Coding Style Guide
    • Condition Handling
    • Data Frames
    • Data Structures
    • Debugging
    • Embedded Domain Specific Languages
    • Environments
    • Expressions
    • Function Operators
    • Functional Programming
    • Functionals
    • Functions
    • Interfacing R to C and C++
    • Lexical Scoping
    • Lists
    • Matrices and Arrays
    • Non-Standard Evaluation
    • Object Oriented R
    • Performance Optimization
    • Rcpp
    • Special Function Calls
    • Subsetting
    • Vectors

    Description

    R is an idiosyncratic language, and after spending some time using it you will certainly have some questions about how it works. This book answers a lot of those questions. The knowledge we've gained from this book has helped us ferret out many pernicious bugs, so we recommend that anyone who uses R a lot read this book. As the title indicates, it's really not for beginners.