Machine learning models for vision and language have shown significant improvements recently, thanks to larger model sizes and a massive amount of high-quality training data. Research shows that more training data improves models in a predictable way, leading to scaling laws that explain the link between error rates and dataset size. These scaling laws help decide the trade-off between model size and data size, but they look at the dataset as a whole without considering individual training examples. This is a limitation because some data points are more valuable than others, especially in noisy datasets collected from the web. Therefore, it is critical to understand how each data point or source affects model training.
The related works in this article discuss a method called Scaling Laws for Deep Learning, which have become popular in recent years. These laws help in several ways, including understanding the trade-offs between increasing training data and model size, predicting the performance of large models, and comparing how well different learning algorithms perform at smaller scales. The second approach focuses on how individual data points can improve model performance. These methods typically score training examples based on their marginal contribution. They can identify mislabeled data, filter out high-quality data, increase the weighting of useful examples, and select promising new data points for active learning.
Researchers at Stanford University have introduced a new approach by investigating the value scaling behavior of individual data points. They found that the contribution of a data point to the performance of a model decreases in a predictable way as the dataset becomes larger, following a log-linear pattern. However, this decrease varies across data points, meaning that some points are more useful in smaller datasets, while others become more valuable in larger datasets. Furthermore, a maximum likelihood estimator and an amortized estimator were introduced to efficiently learn these individual patterns from a small number of noisy observations for each data point.
Experiments are conducted to provide evidence for the parametric scaling law, focusing on three types of models: logistic regression, SVM, and MLP (specifically, two-layer ReLU networks). These models are tested on three datasets: MiniBooNE, CIFAR-10, and IMDB movie reviews. Pre-trained embeddings such as frozen ResNet-50 and BERT are used to speed up training and avoid underfitting for CIFAR-10 and IMDB, respectively. The performance of each model is measured using cross-entropy loss on a test dataset of 1000 samples. For logistic regression, 1000 data points and 1000 samples per value k are used. For SVM and MLP, due to the higher variance in marginal contributions, 200 data points and 5000 samples per dataset size k are used.
The proposed methods are tested by predicting the accuracy of marginal contributions at each dataset size. For example, with the IMDB dataset and logistic regression, expectations can be accurately predicted for dataset sizes ranging from k = 100 to k = 1000. To evaluate this systematically, the accuracy of the scaling law predictions at different dataset sizes is shown for both versions of a likelihood-based estimator using different samples. A more detailed version of these results shows the reduction of the R2 score when predictions are extended beyond k = 2500, while the correlation and rank correlation with actual expectations remain high.
In conclusion, researchers at Stanford University have developed a new method by examining how the value of individual data points changes with scale. They found evidence of a simple pattern that works across different data sets and model types. Experiments confirmed this scaling law by showing a clear log-linear trend and testing how well it predicts contributions across different data set sizes. The scaling law can be used to predict behavior for larger data sets than those initially tested. However, measuring this behavior for a full training data set is expensive, so the researchers developed ways to measure the scaling parameters using a small number of noisy observations per data point.
High-quality data in ai research.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter.
Join our Telegram Channel and LinkedIn GrAbove!.
If you like our work, you will love our Newsletter..
Don't forget to join our Subreddit with over 46 billion users
Sajjad Ansari is a final year student from IIT Kharagpur. As a technology enthusiast, he delves into practical applications of ai, focusing on understanding the impact of ai technologies and their real-world implications. He aims to articulate complex ai concepts in a clear and accessible manner.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>