Information theory is the study of how much information
a particular object contains or can transmit, whether that object is a random variable, a communication channel, a message, etc. It's very important for topics like networking and compression, but it also has more direct applications to data science, such as establishing lower bounds on the variance of estimators.
Recommended Books

Elements of Information Theory
Thomas M. Cover and Joy A. Thomas
 Intext exercises
This is the canonical text on information theory. It is very wellwritten and has a lot of applications. Most of the applications are computer science related, but they are relevant to data science. There are also very interesting chapters on the relationship between information theory and statistics, as well as an interesting chapter on portfolio optimization. The best part about this book is that detailed solutions to the exercises are easy to find online. The mathematical difficulty of this book is fairly low, all you really need is intermediate probability and some gumption.