This paper was accepted into the Safe Generative ai Workshop (SGAIW) at NeurIPS 2024.
Large language models (LLMs) could be valuable personal ai agents in various domains, as long as they can accurately follow user instructions. However, recent studies have demonstrated significant limitations in the instruction-following capabilities of LLMs, raising concerns about their reliability in high-risk applications. Accurately estimating the uncertainty of LLMs when following instructions is critical to mitigating implementation risks. We present, to the best of our knowledge, the first systematic evaluation of the uncertainty estimation capabilities of LLMs in the context of instruction following. Our study identifies key challenges with existing instruction-following benchmarks, where multiple factors intertwine with the uncertainty arising from instruction-following, complicating isolation and comparison between methods and models. To address these issues, we introduce a controlled evaluation setup with two versions of reference data, allowing a comprehensive comparison of uncertainty estimation methods under various conditions. Our findings show that existing uncertainty methods struggle, particularly when models make subtle errors when following instructions. While internal model states offer some improvements, they remain inadequate in more complex scenarios. Insights from our controlled evaluation setups provide crucial insight into the limitations of LLMs and the potential for uncertainty estimation in instruction-following tasks, paving the way for more reliable ai agents.