Image generated with ChatGPT
Learning statistics is a crucial part of your journey towards becoming a data scientist, data analyst, or even an ai engineer. Most of the machine learning models used in modern technology are statistical models. Therefore, having a solid understanding of statistics will make it easier for you to learn and develop advanced ai technologies.
In this blog, we will explore 10 GitHub repositories that will help you master statistics. These repositories include code samples, books, Python libraries, guides, documentation, and visual learning materials.
1. Practical statistics for data scientists
Repository: Practical Statistics for Data Scientists
This repository offers practical examples and code snippets from the book “Practical Statistics for Data Scientists” that cover essential statistical techniques and concepts. It is an excellent starting point for data scientists who want to apply statistical methods in real-world situations.
The book's code repository contains suitable code examples in R and Python. If you are used to the Jupyter Notebook coding style, it also offers similar examples in a Jupyter Notebook for Python and R.
2. Probabilistic programming and Bayesian methods for hackers
Repository: CamDavidsonPilon/Probabilistic programming and Bayesian methods for hackers
This repository provides an interactive and practical introduction to Bayesian methods with Python. The content is presented in Jupyter notebooks using nbviewer, making it easy to follow the theory and Python code on Bayesian models and probabilistic programming.
The interactive book consists of an introduction to Bayesian methods, introduction to the Python PyMC library, Markov Chain Monte Carlo, the law of large numbers, loss functions, and more.
3. Statsmodels: Statistical modeling and econometrics in Python
Repository: state models/state models
Statsmodels is a powerful library for statistical modeling and econometrics in Python. This repository includes comprehensive documentation and examples for performing various statistical tests, linear models, time series analysis, and more. We can use these examples from the documentation to learn how to perform all kinds of statistical analysis, including time series analysis, survival analysis, multivariate analysis, linear regression, and more.
4. TensorFlow Probability
Repository: tensor flow/probability
TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. It extends the core TensorFlow library with tools for building and training probabilistic models, making it an excellent resource for those interested in combining deep learning with statistical modeling.
The documentation contains examples of linear mixed effects models, hierarchical linear models, probabilistic principal component analysis, Bayesian neural networks, and more.
5. The Probability and Statistics Cookbook
Repository: I have /stat-cookbook
This repository is a collection of recipes for solving common statistical problems, serving as a handy reference for finding quick solutions and examples for various statistical tasks. It provides a concise guide to probability and statistics, including concepts such as continuous distribution, probability theory, random variables, expectation, variance, and inequalities. You can use the make command to access the cookbook locally or download the PDF file. The repository also includes LaTeX files for the various statistical concepts.
6. See the theory
Repository: theory of seeing/theory of seeing
Seeing Theory is a visual introduction to probability and statistics. This repository includes interactive visualizations and explanations that make complex statistical concepts more accessible and easier to understand, especially for visual learners.
It is a highly interactive book for beginners and covers various topics such as basic probability, compound probability, probability distributions, frequentist inference, Bayesian inference and regression analysis.
7. Mathematical Statistics with Python
Repository: tirthajyoti/Statistics and Mathematics with Python
This repository contains Jupyter scripts and notebooks covering general statistics, mathematical programming, and scientific computing with Python. It is a valuable resource for anyone looking to strengthen their statistical and mathematical programming skills.
Includes examples on Bayes' rule, Brownian motion, hypothesis testing, linear regression, and more.
8. Python for probability, statistics and machine learning
Repository: unpingco/Python-for-probability-statistics-and-machine-learning
This repository includes code examples and Jupyter notebooks from the book “Python for Probability, Statistics, and Machine Learning” that cover a wide range of topics from basic probability and statistics to advanced machine learning techniques.
Inside the “chapters” folder, there are three subfolders containing Jupyter notebooks on statistics, probability, and machine learning. Each notebook includes code, results, and a description explaining the methodology, code, and results.
9. VIP Cheat Sheets on Probability and Statistics
Repository: shervinea/stanford-cme-106-probability-and-statistics
This repository contains VIP cheat sheets for the Stanford Probability and Statistics for Engineers course. The cheat sheets provide concise summaries of key concepts and formulas, making them a useful reference for students and professionals.
It is a popular cheat sheet that covers topics on conditional probability, random variables, parameter estimation, hypothesis testing, and more.
10. Basic mathematics for machine learning
Repository: hrnbot/Basic Mathematics for Machine Learning
Understanding mathematical foundations is critical to mastering machine learning and statistics. This repository aims to demystify mathematics and help you learn the basics of algebra, calculus, statistics, probability, vectors, and matrices through Python Jupyter Notebooks.
Final thoughts
The learning resources shared on GitHub are created by experts and the open source community, with the aim of sharing their knowledge to pave an easier path for beginners in the fields of data science and statistics. You will learn statistics by reading theory, solving code examples, understanding mathematical concepts, creating projects, performing various analyses, and exploring popular statistical tools. All of this is covered in the GitHub repository mentioned above. These resources are free and anyone can contribute to improving them. So keep learning and keep creating awesome things.
Abid Ali Awan (@1abidaliawan) is a certified data scientist who loves building machine learning models. Currently, he focuses on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in technology Management and a Bachelor's degree in Telecommunication Engineering. His vision is to create an ai product using a graph neural network for students struggling with mental illness.