Image by upklyak in freepik
I'm sure everyone knows about GBM and XGBoost algorithms. They are reference algorithms for many real-world use cases and competition because the metric result is usually better than other models.
For those who don't know GBM and XGBoost, GBM (Gradient Boosting Machine) and XGBoost (eXtreme Gradient Boosting) are joint learning methods. Ensemble learning is a machine learning technique in which multiple “weak” models (often decision trees) are trained and combined for subsequent purposes.
The algorithm was based on the ensemble learning boosting technique shown in its name. Boosting techniques are a method that attempts to combine several weak learners sequentially, each correcting its predecessor. Each student would learn from their previous mistakes and correct the errors of the previous models.
That's the fundamental similarity between GBM and XGB, but what about the differences? We will discuss that in this article, so let's get into it.
As mentioned above, GBM is based on momentum, which attempts to sequentially iterate the weak learner to learn from the error and develop a robust model. GBM developed a better model for each iteration by minimizing the loss function using gradient descent. Gradient descent is a concept to find the minimum function with each iteration, like the loss function. The iteration would continue until the stopping criterion is reached.
For GBM concepts, you can see it in the image below.
Concept of the GBM model (Chhetri et al. (2022))
You can see in the image above that for each iteration, the model tries to minimize the loss function and learn from the previous error. The final model would be the complete weak learner that summarizes all the model predictions.
XGBoost or eXtreme Gradient Boosting is a machine learning algorithm based on the gradient boosting algorithm developed by Tiangqi Chen and Carlos Guestrin in 2016. At a basic level, the algorithm still follows a sequential strategy to improve the next model based on gradient descent. However, some differences from XGBoost place this model as one of the best in terms of performance and speed.
1. Regularization
Regularization is a machine learning technique to avoid overfitting. It is a collection of methods to force the model to become overly complicated and have little generalization power. It has become an important technique as many models fit the training data too well.
GBM does not implement regularization in its algorithm, which makes the algorithm only focus on achieving minimum loss functions. Compared with GBM, XGBoost implements regularization methods to penalize the overfitting model.
There are two types of regularization that XGBoost could apply: L1 Regularization (Lasso) and L2 Regularization (Ridge). L1 regularization attempts to minimize feature weights or coefficients to zero (effectively becoming feature selection), while L2 regularization attempts to reduce the coefficient uniformly (helps deal with multicollinearity). By implementing both regularizations, XGBoost could avoid overfitting better than GBM.
2. Parallelization
GBM tends to have a slower training time than XGBoost because the latter algorithm implements parallelization during the training process. The boosting technique could be sequential, but parallelization could still be done within the XGBoost process.
Parallelization aims to speed up the tree construction process, mainly during the split event. By using all available processing cores, the training time of XGBoost can be shortened.
Speaking of speeding up the XGBoost process, the developer also pre-processed the data in its developed data format, DMatrix, to achieve memory efficiency and improve training speed.
3. Handling missing data
Our training data set could contain missing data, which we need to handle explicitly before passing it to the algorithm. However, XGBoost has its own built-in missing data handler, while GBM does not.
XGBoost implemented its technique for handling missing data, called Sparsity-aware Split Finding. For any sparse data that XGBoost encounters (missing data, dense zero, OHE), the model will learn from this data and find the most optimal split. The model would map where the missing data should be placed during the split and see which direction minimizes the loss.
4. Tree pruning
The GBM growth strategy is to stop splitting after the algorithm reaches the negative loss on the split. The strategy could lead to suboptimal results because it relies solely on local optimization and could neglect the bigger picture.
XGBoost tries to avoid the GBM strategy and grows the tree until the maximum depth of the set parameter starts to be pruned back. The negative loss split is removed, but there is a case where the negative loss split is not removed. When the split results in a negative loss, but the additional split is positive, it will still be retained if the overall split is positive.
5. Built-in cross validation
Cross validation is a technique to evaluate the generalizability and robustness of our model by systematically splitting the data over several iterations. Taken together, their result would show whether the model is overfitting or not.
Normally, the machine algorithm would require external help to implement cross-validation, but XGBoost has built-in cross-validation that could be used during the training session. Cross-validation would be performed on each boost iteration and ensure that the production tree is robust.
GBM and XGBoost are popular algorithms in many real-world cases and competitions. Conceptually, both are boosting algorithms that use weak learners to achieve better models. However, they contain few differences in their algorithm implementation. XGBoost improves the algorithm by incorporating regularization, performing parallelization, better handling of missing data, different tree pruning strategies, and built-in cross-validation techniques.
Cornellius Yudha Wijaya He is an assistant data science manager and data writer. While working full-time at Allianz Indonesia, she loves sharing Python tips and data through social media and print media.