Editor's Image
To learn data science, you also need a solid foundation in mathematics. And statistics is one of those essential math skills for data science.
However, learning statistics can be intimidating, especially if you are in a major other than mathematics or computer science. To help you get started, we've compiled a list of free books that make statistics for data science accessible.
Most of these books take a practical approach to statistical concepts, which is what you need to use statistics effectively as a data scientist. So let's review these statistics books.
He Introductory statistics The book is an accessible introduction to statistics that covers what is typically covered in a semester-long introductory statistics course at universities.
Available for free access on OpenStax and written by a team of expert contributing authors, this book takes an application-first statistics approach rather than a theory-first approach and includes examples in exercises for each topic.
This book will help you learn the following:
- Sampling and data
- Descriptive statistics
- Topics of probability and random variables
- Normal distribution
- The central limit theorem
- Confidence intervals
- Hypothesis evaluation
- The Chi-Square Distribution
- Linear regression and correlation.
- F distribution and one-way ANOVA
Link: Introductory Statistics 2e
Introduction to modern statistics is a free online textbook from the OpenIntro project and is written by authors Mine Çetinkaya-Rundel and Johanna Hardin.
If you want to learn the statistical fundamentals for effective data analysis, this book is for you. The content of this book is the following:
- Introduction to data
- Exploratory data analysis
- Regression modeling
- Fundamentals of inference
- Statistical inference
- Inferential modeling
Link: Introduction to modern statistics
Think statistics by Allen B. Downey will help you learn and practice statistics concepts using Python.
This way, you can apply your Python skills to learn statistics and probability concepts to work with data effectively. As she progresses through the book, she will be able to write short programs in Python and practice with real data sets to reinforce her understanding of statistical concepts.
The topics discussed are the following:
- Exploratory data analysis
- Distribution
- Probability mass functions
- Cumulative distribution functions
- Modeling distributions
- Probability density functions
- Relationships between variables
- Estimate
- Hypothesis evaluation
- Linear least squares
- Regression
- Survival analysis
- Analytical methods
Link: Think 2e stats
Computational and Inferential Thinking: The Fundamentals of Data Science by Ani Adhikari, John DeNero, and David Wagner will help you learn the statistical foundations for data science.
This book was developed as a complement to the Data 8: Data Science Fundamentals course offered at UC Berkeley. Topics covered in this book include:
- Introduction to data science.
- Python programming
- Data Types, Sequences, and Tables
- Display
- Functions and Tables
- Randomness
- Sampling and empirical distribution.
- Hypothesis evaluation
- Estimate
- Regression
- Classification
Link: Computational and Inferential Thinking: The Fundamentals of Data Science
Probabilistic programming and Bayesian methods for hackers or Bayesian Methods for Hackers is a popular book on Bayesian methods in statistics.
“Bayesian Methods for Hackers” – An introduction to Bayesian methods + probabilistic programming with a calculation/understanding first and mathematics second point of view. All in pure Python 😉
– Fountain
You will become familiar with probability theory and Bayesian inference while using the PyMC Package. The content of this book is the following:
- Introduction to Bayesian methods.
- The PyMC Library
- Monte Carlo Markov Chain
- The law of large numbers
- Loss functions
- Priors
Link: Probabilistic programming and Bayesian methods for hackers
I hope you found this summary of free statistics books useful. The combination of theory and hands-on practice should help you improve your data science skills and make more informed decisions when working with large real-world data sets.
If you prefer to work with free courses or are looking to supplement your reading with courses, check out 5 Free Courses to Master Statistics for Data Science.
Bala Priya C. is a developer and technical writer from India. He enjoys working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. He likes to read, write, code and drink coffee! Currently, he is working to learn and share his knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource descriptions and coding tutorials.