Meet LLMSA: A Compositional Neurosymbolic Approach for Compilation-Free, Customizable Static Analysis with Reduced Hallucinations

Static analysis is an inherent part of the software development process, enabling activities such as bug finding, program optimization, and debugging. Traditional approaches have two main drawbacks: methods based on code compilation are destined to fail in any development scenario where the code is incomplete or changes rapidly, and the need to adapt it requires deep knowledge of the internals of the code. compiler and IRs inaccessible to many developers. These issues prevent static analysis tools from being widely used in real-world scenarios.

Existing static analysis tools, such as FlowDroid and Infer, use IR to detect problems in programs. However, they are compile-dependent, which limits their usability in dynamic and incomplete codebases. Furthermore, they do not have enough support to adapt analysis tasks to the needs of specific users; rather, customization requires deep knowledge of compiler infrastructures. However, query-based systems like CodeQL, which seek to mitigate these limitations, present significant learning challenges stemming from intricate domain-specific languages and comprehensive application programming interfaces. These deficiencies limit its efficiency and acceptance in various programming contexts.

Researchers from Purdue University, Hong Kong University of Science and technology, and Nanjing University have designed LLMSA. This neurosymbolic framework aims to break the bottlenecks associated with traditional static analysis by enabling compile-free functionality and complete customization. The LLMSA framework uses a data-oriented policy language to decompose complex analytical tasks into smaller, more manageable subproblems. The methodology successfully addresses hallucination errors in linguistic models by combining a deterministic analysis focused on syntactic attributes with neural reasoning directed at semantic elements. Furthermore, its implementation of complex techniques such as lazy evaluation in which neural calculations are postponed until needed and incremental and parallel processing that optimizes the utilization of computational resources while minimizing redundancy significantly improve its effectiveness. This architectural framework positions LLMSA as a versatile and robust substitute for conventional static analysis techniques.

The proposed framework combines the symbolic and neural elements to satisfy its objectives. Symbolic constructors determine abstract syntax trees (ASTs) deterministically to obtain syntactic features, while neural components apply large language models (LLMs) to reason about semantic relationships. The limited Datalog-style policy language allows the user to outline tasks intuitively, breaking them down into exact rules for inspection. Lazy evaluation saves computational costs by performing neural operations only when necessary, while incremental processing saves redundant computations in iterative processes. Concurrent execution causes independent rules to be executed simultaneously and greatly improves performance. The framework has been tested with Java programs on tasks such as alias analysis, program splitting, and error detection, thus demonstrating its versatility and scalability.

LLMSA performed well on a variety of static analysis tasks. It achieved a precision of 72.37% and a recall of 85.94% for alias analysis and a precision of 91.50% and a recall of 84.61% for program splitting. For error detection tasks, it had an average precision of 82.77% and recall of 85.00%, thus outperforming dedicated tools like NS-Slicer and Pinpoint by a good F1 score margin. Additionally, the methodology was able to identify 55 out of 70 contamination vulnerabilities in the TaintBench dataset, with a recovery rate that outperformed an industrial-grade tool by 37.66% and a significant improvement in F1 score. LLMSA achieved an improvement of up to 3.79 times compared to other designs in terms of computational efficiency, thus demonstrating its potential to perform various analytical tasks efficiently and competently.

This research presents LLMSA as a transformative approach to static analysis, overcoming challenges related to compilation dependency and limited customization. High performance, scalability and flexibility across applications have been achieved in the context of different analysis tasks using the neurosymbolic framework together with a properly defined policy language. Efficiency and versatility make LLMSA an essential resource, facilitating advanced static analysis methods for software development.

Verify he Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. Don't forget to join our SubReddit over 60,000 ml.

Trending: LG ai Research launches EXAONE 3.5 – three frontier-level bilingual open-source ai models that deliver unmatched instruction following and broad context understanding for global leadership in generative ai excellence….

Aswin AK is a Consulting Intern at MarkTechPost. He is pursuing his dual degree from the Indian Institute of technology Kharagpur. He is passionate about data science and machine learning, and brings a strong academic background and practical experience solving real-life interdisciplinary challenges.

(Download) Large Language Model Vulnerability Assessment Report (Promoted)