Recent exponential advances in natural language processing capabilities from large language models (LLMs) have sparked enormous excitement about their potential to achieve human-level intelligence. Their ability to produce remarkably coherent texts and engage in dialogue after exposure to vast data sets seems to point toward flexible, general-purpose reasoning skills.
However, a growing chorus of voices urges caution against unbridled optimism by highlighting fundamental blind spots that limit neural approaches. LLMs still frequently make basic logical and mathematical errors that reveal a lack of systematicity behind their answers. Their knowledge remains intrinsically statistical without deeper semantic structures.
More complex reasoning tasks further expose these limitations. LLMs wrestle with causal, counterfactual, and compositional reasoning challenges that require going beyond superficial pattern recognition. Unlike humans who learn abstract schemas to flexibly recombine modular concepts, neural networks memorize correlations between concurrent terms. This results in fragile generalization outside of narrow training distributions.
The chasm highlights how human cognition employs structured symbolic representations to enable systematic composability and causal models to conceptualize dynamics. We reason by manipulating modular symbolic concepts based on valid inference rules, chaining logical dependencies, leveraging mental simulations, and postulating mechanisms that relate variables. The inherently statistical nature of neural networks prevents the development of such structured reasoning.
It remains a mystery how symbolic phenomena arise in LLMs despite their subsymbolic substrate. But clearer recognition of this “hybridization gap” is imperative. True progress requires adopting complementary strengths (the flexibility of neural approaches with structured knowledge representations and causal reasoning techniques) to create integrated reasoning systems.
We first describe the growing chorus of analyzes exposing the lack of systematicity, causal understanding, and compositional generalization of neural networks, highlighting differences from innate human faculties.
Below, we detail the most prominent facets of the “reasoning gap,” including struggles with modular skill orchestration, breakdown dynamics, and counterfactual simulation. We bring out innate human capabilities…