ToolSandbox: An interactive, conversational, stateful assessment benchmark for LLM tool usage capabilities
Recent advances in large language models (LLMs) sparked a growing research interest in tool-assisted LLMs that solve real-world challenges, requiring ...