Image by factory imp on Freepik
I often talk about explainable methods of AI (XAI) and how they can be adapted to address some pain points that prevent companies from building and deploying AI solutions. you can check my Blog if you need a quick refresher on XAI methods.
One of these XAI methods is decision trees. They have gained significant traction historically due to their interpretability and simplicity. However, many think that decision trees cannot be accurate because they look simple and greedy algorithms like C4.5 and CART do not optimize them well.
The statement is partially valid since some variants of decision trees, such as C4.5 and CART, have the following disadvantages:
- Prone to overfitting, particularly when the tree gets too deep with too many branches. This can result in poor performance on new, invisible data.
- It can be slower to evaluate and make predictions with large data sets because they require making multiple decisions based on input feature values.
- It can be difficult for them to handle continuous variables, as they require the tree to split the variable into multiple smaller intervals, which can increase the complexity of the tree and make it more difficult to identify meaningful patterns in the data.
- Often known as the “greedy” algorithm, it makes the locally optimal decision at each step without considering the consequences of those decisions on future steps. Suboptimal trees are an output of CART, but there is no “real” metric to measure it.
More sophisticated algorithms, such as ensemble learning methods, are available to address these problems. But it can often be considered a “black box” due to the stressed operation of the algorithms.
However, recent work has shown that if you optimize decision trees (instead of using greedy methods like C4.5 and CART), they can be surprisingly accurate, in many cases as accurate as the black box. One such algorithm that can help optimize and address some of the disadvantages mentioned above is GOSDT. GOSDT is an algorithm for producing sparse optimal decision trees.
The blog is intended to provide a smooth introduction to GOSDT and present an example of how it can be implemented in a dataset.
This blog is based on a research paper published by some fantastic people. you can read the newspaper here. This blog is not a substitute for this document, nor will it touch on extremely mathematical details. This is a guide for data science professionals to learn about this algorithm and take advantage of it in their daily use cases.
In a nutshell, GOSDT addresses some important issues:
- Handle lopsided data sets well and optimize various objective functions (not just precision).
- It fully optimizes trees and does not greedily build them.
- It is almost as fast as greedy algorithms, since it solves NP-hard optimization problems for decision trees.
- GOSDT trees use dynamic search space via hash trees to improve model efficiency. By limiting the search space and using bounds to identify similar variables, GOSDT trees can reduce the number of computations required to find the optimal split. This can significantly improve computation time, especially when working with continuous variables.
- In GOSDT trees, split limits apply to partial trees and are used to remove many trees from the search space. This allows the model to focus on one of the remaining trees (which may be a partial tree) and evaluate it more efficiently. By reducing the search space, GOSDT trees can quickly find the optimal split and generate a more accurate and interpretable model.
- GOSDT trees are designed to handle unbalanced data, a common challenge in many real-world applications. GOSDT trees address lopsided data using a weighted precision metric that considers the relative importance of different classes in the data set. This can be particularly useful when there is a predetermined threshold for the desired level of accuracy, as it allows the model to focus on correctly classifying the samples that are most critical to the application.
- These trees directly optimize the balance between training accuracy and the number of leaves.
- Produces excellent training and test accuracy with a reasonable number of blades
- Perfect for highly non-convex problems.
- Most effective for a small to medium number of features. But it can handle up to tens of thousands of observations while maintaining its speed and accuracy.
It’s time to see it all in action!! In my previous blog, I solved a loan application approval problem using Keras Classification. We will use the same data set to build a classification tree using GOSDT.
Code by author
Supreme Kaur is AVP at Morgan Stanley. She is a fitness and technology enthusiast. She is the founder of the community called DataBuzz.