In natural language processing (NLP), a central question is how well the probabilities generated by language models (LMs) align with human linguistic behavior. This alignment is often assessed by comparing LM scores with human acceptability judgments, which evaluate how natural a sentence feels. Previous studies, such as those using SLOR (Syntactic Log-Odds Ratio), have attempted to close this gap, but significant problems remain. SLOR assumes uniform correction for factors such as sequence length and unigram frequency across different models, which can lead to inaccuracies. A more dynamic approach is needed, one that can better accommodate differences between models and the complexities of human language processing.
MORCELA: A new theory of the link
A team of NYU and CMU researchers proposes MORCELA (Magnitude Optimized Regression to Control for Effects on Linguistic Acceptability), which introduces a new linking theory that addresses these challenges. Unlike SLOR, which applies static adjustments to unigram length and frequency, MORCELA estimates the optimal level of adjustment from the data, using learned parameters specific to these effects. By incorporating parameters (β for unigram frequency and γ for sentence length), MORCELA adjusts LM scores, resulting in better correlation with human judgments. This approach better explains how LMs perceive word rarity and sentence length compared to human expectations. The central idea behind MORCELA is that not all linguistic models should receive the same correction, since models differ in how well they predict linguistic acceptability.
Technical description
MORCELA works by incorporating parameters that are trained in judgments of human acceptability. These parameters control the degree of correction applied to the LM log probabilities, making MORCELA more adaptive than its predecessors such as SLOR. Specifically, the learned parameter β adjusts the impact of unigram frequency, while γ controls the correctness of sentence length. The flexibility of these settings allows MORCELA to better match human acceptability ratings, especially for larger models. For example, larger models, which tend to have a more nuanced understanding of language, often require fewer adjustments for unigram frequency due to their improved ability to predict less common words in context.
Performance and meaning
The importance of MORCELA becomes evident when considering its performance in different LM sizes. MORCELA outperformed SLOR in predicting human acceptability judgments for models from two known families: Pythia and OPT. The results showed that as the models grew, MORCELA's correlation with human judgments improved. The optimal parameter values estimated by MORCELA revealed that larger LMs are more robust to frequency and length effects, and require less correction. This suggests that older LMs have a better understanding of linguistic context, allowing them to predict the acceptability of rare words more accurately, thus reducing the impact of unigram frequency as a confounding factor. MORCELA improved the correlation between LM-generated scores and human judgments by up to 46% compared to SLOR, demonstrating its ability to adjust corrections more accurately.
This advance is important for several reasons. First, it suggests that current LMs may be better able to reflect human language processing than previously thought, provided appropriate corrections are applied. Second, insights from MORCELA may be valuable in psycholinguistic studies that use LM as surrogates for human language understanding. By providing a more precise theory of entailment, MORCELA ensures that LMs are evaluated in a way that aligns more closely with human linguistic intuition. For example, a key result from the MORCELA implementation showed that larger LMs relied less on unigram frequency corrections, indicating that these models have better understanding of less frequent and context-specific words. This feature could significantly affect how we interpret LMs in tasks involving rare or domain-specific language.
Conclusion
MORCELA represents an important advance in aligning linguistic models with judgments of human acceptability. Using learned parameters to dynamically adjust length and frequency addresses critical flaws in previous approaches such as SLOR. The results show that, with proper tuning, LMs can better reflect human linguistic intuition, particularly as the models increase in size. Future work could explore further adjustments or new parameters that could bring LMs even closer to human-like language understanding. MORCELA not only improves the evaluation process of LMs, but also provides valuable information about how these models process language, bridging the gap between machine-generated probabilities and human language behavior.
look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. If you like our work, you will love our information sheet.. Don't forget to join our SubReddit over 55,000ml.
(FREE VIRTUAL CONFERENCE ON ai) SmallCon: Free Virtual GenAI Conference with Meta, Mistral, Salesforce, Harvey ai and More. Join us on December 11 for this free virtual event to learn what it takes to build big with small models from ai pioneers like Meta, Mistral ai, Salesforce, Harvey ai, Upstage, Nubank, Nvidia, Hugging Face and more.
Aswin AK is a consulting intern at MarkTechPost. He is pursuing his dual degree from the Indian Institute of technology Kharagpur. He is passionate about data science and machine learning, and brings a strong academic background and practical experience solving real-life interdisciplinary challenges.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>