Do LLMs internally "know" when they follow instructions?

This article was accepted into the Foundation Model Interventions (MINT) Workshop at NeurIPS 2024.

Following instructions is crucial for creating ai agents with large language models (LLMs), as these models must strictly adhere to the guidelines provided by the user. However, LLMs often do not follow even simple instructions. To improve instruction-following behavior and prevent undesirable outcomes, we need a deeper understanding of how the internal states of LLMs relate to these outcomes. Our analysis of the internal states of LLM reveals a dimension in the input incorporation space linked to successful instruction following. We show that changing representations along this dimension improves success rates in following instructions compared to random changes, without compromising response quality. This work provides insight into the inner workings of LLM instruction following, paving the way for reliable LLM agents.

Do LLMs internally “know” when they follow instructions?

Technical Terrence Team

COP29 agrees deal to boost global carbon credit trading By Reuters

Leave a Reply Cancel reply

Recommended.

AI chatbots compared: Bard vs. Bing vs. ChatGPT

Crypto accounts for 70% of South Korea’s overseas assets

Ethereum Co-founder Reveals Details Behind X Account Hack

Recipient Summary: August 2021 | Ethereum Foundation Blog

The Bitcoin Report: Key Trends, Outlook and Bitcoin Price Forecast

Categories

Important Links