Welcome to the 'Value to Learn ML'. This series aims to simplify complex machine learning concepts, presenting them as a relaxed and informative dialogue, much like the engaging style of “The courage of not being loved”, but with a focus on ML.
In this installment of our series, our mentor-student duo dives into a new discussion on statistical concepts like MLE and MAP. This discussion will lay the groundwork for us to gain a new perspective on our previous exploration of L1 and L2 regularization. To get the full picture, I recommend reading this post before reading part four of 'Courage to learn ML: demystifying L1 and L2 regularization.
This article is designed to address fundamental questions that might have crossed your path in a question and answer style. As always, if you have similar questions, you've come to the right place:
- What exactly is “probability”?
- The difference between verisimilitude and probability.
- Why is probability important in the context of machine learning?
- What is MLE (maximum likelihood estimation)?
- What is MAP (maximum a posteriori estimate)?)?
- The difference between MLE and least squares
- The links and distinctions between MLE and MAP
Probability, or more specifically the probability function, is a statistical concept used to evaluate the probability of observing given data under various sets of model parameters. It is called a probability (function) because it is a function that quantifies the probability of observing the current data for different parameter values of a statistical model.
The concepts of likelihood and probability are fundamentally different in statistics. Probability measures the possibility of observing a specific outcome in the future, given known parameters or distributions.…