This algorithm is known as “Gradient Descent” or “Steepest Descent Method”, being an optimization method for finding the minimum of a function where each step is taken in the direction of the negative gradient. This method does not guarantee that the global minimum of the function will be found, but rather a local minimum.
Discussions on how to find the global minimum could be developed in another article, but here we have mathematically demonstrated how the gradient can be used for this purpose.
Now, applying it to the cost function. my that depends on north dumbbells whave:
To update all elements of W. Based on gradient descent, we have:
and for any northth element 𝑤 of the vector W.have:
Therefore, we have our theoretical learning algorithm. Logically, this does not apply to the hypothetical idea of the cook, but to numerous machine learning algorithms that we know today.
Based on what we have seen, we can conclude the demonstration and mathematical proof of the theoretical learning algorithm. This framework is applied to numerous learning methods, such as AdaGrad, Adam, and Stochastic Gradient Descent (SGD).
This method does not guarantee finding the north-weight values w where he cost function returns a result equal to zero or very close to it. However, it assures us that a local minimum of the cost function will be found.
To address the issue of local minima, there are several more robust methods, such as SGD and Adam, which are commonly used in deep learning.
However, understanding the structure and mathematical proof of the theoretical learning algorithm based on gradient descent will make it easier to understand more complex algorithms.
References
Carreira-Perpiñán, MA and Hinton, GE (2005). On learning contrastive divergence. In R. G. Cowell and Z. Ghahramani (Eds.), artificial intelligence and Statistics, 2005 (pp. 33–41). Fort Lauderdale, FL: Society for Statistics and artificial intelligence.
García Cabello, J. Mathematical Neural Networks. Axioms 2022, 11, 80.
Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh. A fast learning algorithm for deep belief networks. Neural Computation 18, 1527-1554. Massachusetts Institute of technology
LeCun, Y., Bottou, L., & Haffner, P. (1998). Gradient-based learning applied to document recognition. IEEE Proceedings, 86(11), 2278–2324.