Language models know more than they show: explore hallucinations from the model point of view

Large language models (LLM) often produce errors, including inaccuracies of objectives, biases and reasoning failures, collectively called “hallucinations.” Recent studies have shown that the internal states of the LLMs encode information on the veracity of their results, and that this information can be used to detect errors. In this work, we show that the internal representations of LLM encode much more information about the veracity of what was previously recognized. First we discover that veracity information is concentrated in specific tokens, and taking advantage of this property significantly improves error detection performance. However, we show that such error detectors fail to generalize in all data sets, which implies that, against previous statements, truthful coding is not universal but rather multifaceted. Next, we show that internal representations can also be used to predict the types of errors that the model probably makes, facilitating the development of custom mitigation strategies. Finally, we reveal a discrepancy between the internal coding of LLM and external behavior: they can encode the correct answer, but constantly generate an incorrect. Together, these ideas deepen our understanding of LLM errors from the internal perspective of the model, which can guide future research on the improvement of errors analysis and mitigation.

† Work partially during Apple's internship
‡ Technion – Israel Institute of technology
§ Google research

Language models know more than they show: explore hallucinations from the model point of view

Technical Terrence Team

2 actions from the United Kingdom that could be significantly affected by the new rates rumors

Leave a Reply Cancel reply

Recommended.

Elon Musk's xAI is said to be in talks to raise $3 billion in funding at a valuation of $18 billion

Why did Bitcoin price hit new all-time highs in Turkey, Egypt, Nigeria and Argentina?

‘We’re on a long-term journey’: taking a walk towards self-driving cars | driverless cars

Why Tesla's Stock Price Soared 38% in November

Internet Archive returns to a read-only service after cyberattacks

Categories

Important Links

Language models know more than they show: explore hallucinations from the model point of view

Related

Technical Terrence Team

2 actions from the United Kingdom that could be significantly affected by the new rates rumors

Leave a Reply Cancel reply

Recommended.

Elon Musk's xAI is said to be in talks to raise $3 billion in funding at a valuation of $18 billion

Why did Bitcoin price hit new all-time highs in Turkey, Egypt, Nigeria and Argentina?

‘We’re on a long-term journey’: taking a walk towards self-driving cars | driverless cars

Why Tesla's Stock Price Soared 38% in November

Internet Archive returns to a read-only service after cyberattacks

Categories

Important Links

Get daily news updates to your inbox!