Are we undervaluing simple models?

Image generated by DALL-E 2

The current trend in the world of machine learning is all about advanced models. The movement mainly driven by the reference model of many courses is the complex model, and it seems much more incredible to use a model like deep learning or LLMs. Entrepreneurs did not help with this idea either, as they only saw the popular trend.

Simplicity does not mean disappointing results. A simple model just means that the steps you use to deliver the solution are less complicated than the advanced model. It may use fewer parameters or simpler optimization methods, but a simple model is still valid.

Referring to the principle of philosophy, Occam's razor o Law of Parsimony states that the simplest explanation is usually the best. It implies that most problems can usually be solved by the simplest approach. That is why the value of the simple model is, by its simple nature, to solve the problem.

A simple model is as important as any type of model. That is the crucial message the article wants to convey and we will explore why. So, let's get into it.

When we talk about simple models, what constitutes a simple model? Logistic regression or naïve Bayes is often called a simple model, while neural networks are complex; How about a random forest? Is it a simple or complex model?

We generally do not classify Random Forest as a simple model, but we often hesitate to classify it as complex. This is because there are no strict rules governing the simple level classification of the model. However, there are some aspects that could help classify the model. They are:

– Number of parameters,

– Interpretability,

– Computational efficiency.

These aspects also affect the benefits model. Let's analyze them in more detail.

Number of parameters

The parameter is an inherent configuration of the model that is learned or estimated during the training process. Unlike the hyperparameter concept, the user cannot set the parameter initially, but is affected by the hyperparameter options.

Examples of parameters include the linear regression coefficient, the neural network weights and biases, and the K-means cluster centroid. As you can see, the model parameter values change independently as we learn from the data. The parameter value is constantly updated in the model iteration until the final model is present.

Linear regression is a simple model because it has few parameters. The parameters of linear regression are its coefficients and its intercept. Depending on the number of functions we train, the linear regression would have n+1 parameters (n is the number of feature coefficients plus 1 for the intersection).

Compared with the neural network, the model is more complex to calculate. The parameter in NN consists of weights and biases. The weight would depend on the input of the layer (north) and neurons (p), and the number of the weight parameter would be n*p. Each neuron would have its bias, so for each pthere would be a p inclination. In total, the parameters would be around (n*p) + p number. Then the complexity increases with each addition of layers, where each additional layer would increase (n*p) + p parameters.

We have seen that the number of parameters affects the complexity of the model, but how does it affect the overall performance of the model output? The most crucial concept is that it affects the risks of overfitting.

Overfitting occurs when our model algorithm has little generalization power because it is learning the noises in a data set. With more parameters, the model could capture more complex patterns in the data, but it also includes noises since the model assumes they are significant. In contrast, a smaller parameter model has limited capacity, meaning it is more difficult to overfit.

There are also direct effects on interpretability and computational efficiency, which we discuss later.

Interpretability

Interpretability is a machine learning concept that refers to the ability of machine learning to explain the result. Basically, this is how the user can understand the result of the model's behavior. The significant value of simple models lies in their interpretability, and is a direct effect coming from a smaller number of parameters.

With fewer parameters, the interpretability of the simple model increases as the model becomes easier to explain. Furthermore, the internal workings of the model are more transparent as it is easier to understand the role of each parameter than the complex one.

For example, the linear regression coefficient is easier to explain since the coefficient parameter directly influences the characteristic. In contrast, a complex model like NN is challenging to explain the direct contribution of the parameter to the prediction result.

The value of interpretability is enormous in many lines of business or projects, since a particular business requires that the result can be explained. For example, prediction in the medical field requires explainability, as the medical expert must have confidence in the result; after all, it is affecting individual life.

Avoiding bias in the model decision is also the reason why many prefer to use a simple model. Imagine that a lending company trains a model with a data set full of biases and the result reflects these biases. We want to eliminate biases because they are unethical, so explainability is vital to detecting them.

Computational efficiency

Another direct effect of fewer parameters is an increase in computational efficiency. A smaller number of parameters means less time to find the parameters and less computational power.

In production, a more computationally efficient model would be more accessible to deploy and would have a shorter inference time in the application. The effect would also lead to simple models being more easily implemented on resource-constrained devices such as smartphones.

In general, a simple model would use fewer resources, which would result in less money spent on processing and implementation.

We might underestimate a simple model because it doesn't look sophisticated or doesn't provide the most optimal metrics output. However, there are many values we can take from the Simple model. When observing the aspect that classifies the simplicity of the model, the Simple model provides these values:

– Simple models have a smaller number of parameters, but they also reduce the risk of overfitting.

– With fewer parameters, the Simple model provides a greater explainability value.

– Furthermore, a smaller number of parameters means that the simple model is computationally efficient.

Cornellius Yudha Wijaya He is an assistant data science manager and data writer. While working full-time at Allianz Indonesia, she loves sharing Python tips and data through social media and print media.