In my previous article, we introduced the multilayer perceptron (MLP), which is just a set of interconnected stacks perceptrons. I highly recommend you check out my previous post if you are not familiar with perceptron and MLP as we will discuss it quite a bit in this article:
Below is an example of MLP with two hidden layers:
However, the problem with the MLP is that it can only fit a linear classifier. This is because individual perceptrons have a step function as their activation functionwhich is linear:
So even though our perceptron stacking may look like a modern neural network, it is still a linear classifier and is not much different from normal linear regression.
Another problem is that it is not completely differentiable over the entire domain range.
So what do we do about it?
Non-linear activation functions!
What is linearity?
Let’s quickly explain what linearity means to build some context. Mathematically, a function is considered linear if it meets the following condition:
There is also another condition:
But we will work with the previous equation for this demonstration.
Let’s take this very simple case: