Yandex introduces TabReD: a new benchmark for tabular machine learning

In recent years, research on tabular machine learning has grown rapidly. However, it still poses significant challenges for researchers and practitioners. Traditionally, academic benchmarks for tabular machine learning have not fully represented the complexities found in real-world industrial applications.

Most available datasets lack the temporal metadata required for time-based splits or come from less extensive data acquisition and feature engineering processes compared to typical industry ML practices. This can influence the types and amounts of predictive, uninformative, and correlated features, affecting model selection. These limitations can lead to overly optimistic performance estimates when models evaluated on these benchmarks are deployed in real-world ML production scenarios.

To address these gaps, researchers from Yandex and HSE University have introduced TabReD, a new benchmark designed to faithfully reflect industrial-grade tabular data applications. TabReD consists of eight datasets from real-world applications spanning domains such as finance, food delivery, and real estate. The team has made the code and datasets publicly available at GitHub.

Building the TabReD benchmark

To build TabReD, the researchers used datasets from Kaggle contests and Yandex ML applications. They followed four rules: datasets must be tabular, feature engineering must match industry practices, and datasets with data leaks must be excluded. They also made sure that the datasets had timestamps and sufficient samples for time-based splits, excluding those without future instances.

The eight datasets in the TabReD benchmark include the following:

Home Insurance: Predicts whether a customer will purchase home insurance based on user and policy characteristics.
Ecommerce Offers: Classifies whether a customer will redeem a discount offer based on transaction history.
Credit Default: Predicts whether bank customers will default on a loan, using extensive internal and external data, focusing on the stability of the model over time.
Sberbank Housing – predicts the sale price of properties in the Moscow real estate market, using detailed economic and property indicators.
Cook Time: Estimate the time it takes a restaurant to prepare an order based on the contents of the order and historical cook times.
Delivery ETA: Predicts estimated arrival time for online grocery orders using courier service availability, navigation data, and historical delivery information.
Map Routes: Estimate travel time on a car navigation system based on current road conditions and route details.
Climate: Forecasts temperature using weather station measurements and physical models.

These datasets have two key practical properties that are often missing in academic benchmarks. First, they are split into training, validation, and test sets based on timestamps, which is essential for accurate evaluation. Second, they include more features due to extensive data acquisition and feature engineering efforts.

Experimental results and future research

The researchers tested recent deep learning methods for tabular data on the TabReD benchmark to evaluate their performance with time-based data splits and additional features.

They concluded that time-based data splits were crucial for proper evaluation. The choice of splitting strategy significantly affected all aspects of model comparison: absolute metric values, relative performance differences, standard deviations, and the relative ranking of the models.

The results identified MLP with continuous feature embeddings as a simple but effective deep learning foundation, while state-of-the-art models showed less impressive performance in this context.

TabReD bridges the gap between academic research and industrial application in tabular machine learning. It enables researchers to develop and evaluate models that are more likely to perform well in production environments by providing a benchmark that closely reflects real-world scenarios. This is crucial for the optimized adoption of new research findings into practical applications.

The TabReD benchmark study lays the groundwork for exploring other avenues of research, such as continuous learning, handling gradual temporal changes, and improving feature selection and engineering techniques. It also highlights the need to develop robust evaluation protocols to better assess the true performance of machine learning models in dynamic, real-world environments.

Review the Paper and GitHubAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..

Don't forget to join our Subreddit with over 46 billion users

Find upcoming ai webinars here

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary engineer and entrepreneur, Asif is committed to harnessing the potential of ai for social good. His most recent initiative is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has over 2 million monthly views, illustrating its popularity among the public.