Logs provide important information that are often the first signs of system problems, making them an essential tool for program maintenance and fault diagnosis. These logs must be effectively analyzed for automated log analysis tasks such as anomaly identification, troubleshooting, and root cause investigation. The act of converting semi-structured log messages into structured templates is known as log analysis and is a prerequisite for performing these automated tasks.
However, log analysis technology faces several obstacles in real-world systems, which often lead to performance issues. These shortcomings can be attributed to the following three main factors:
- Reliance on heuristic-based parsers: Traditional log parsers typically use heuristic-based techniques, which require hand-crafted functions and deep knowledge of a particular domain. These techniques struggle to scale successfully across different systems, although they can perform admirably in restricted contexts. Generalizing these parsers to handle the wide range of log formats and structures found in large-scale systems is challenging, as they require manually constructed rules.
- Limitations of Large Language Model (LLM)-based parsers: Several contemporary log parsers use LLM to analyze log data. These LLM-based parsers typically operate offline and process logs in batches at regular intervals. This offline method limits their usefulness in real-time applications because fast log analysis is essential to locate and fix problems as soon as they arise. These parsers may be less useful in situations where it is necessary to react quickly to anomalies due to the inherent delay of offline processing.
- Difficulties with online analysis algorithms: While certain log analyzers are designed to operate online and handle logs as they are generated in real time, they have their own set of difficulties. One major problem is log drift, which occurs when small modifications to the content or format of logs over time lead to an increase in false positives. False positives can overload the system, masking real anomalies and preventing the timely identification and resolution of real problems.
In recent research, the Hierarchical Embedding-based Log Parser (HELP) has been put forward as a solution to these problems. Leveraging the strength of LLMs, HELP is an innovative semantic-based online log parser that produces log analysis that is highly efficient and reasonably priced. HELP is unique among log parsers due to its hierarchical embedding module, which optimizes a text embedding model for log data. By clustering logs prior to analysis, this methodology dramatically reduces the cost and complexity of accessing log data by several orders of magnitude.
An iterative rebalancing module has also been included in HELP to address the problem of log skew. This module ensures that the analyzer remains accurate and functional even if log formats change over time, as it routinely updates the current log groupings. HELP maintains a high degree of accuracy in recognizing genuine anomalies, while reducing the frequency of false positives by continually improving its understanding of log data.
The effectiveness of HELP has been extensively evaluated using 14 large-scale public datasets. HELP showed significantly higher F1-weighted clustering and analysis accuracy compared to state-of-the-art online log analyzers. In addition to passing these benchmark tests, HELP has been effectively integrated into Iudex’s production observability platform. The feasibility and reliability of HELP in handling high-performance log processing tasks in production contexts have been validated by this real-world application.
The team has summarized its main contributions as follows.
- To facilitate online log clustering and analysis, HELP, the first log analyzer using semantic embeddings, has been developed.
- HELP has been effectively implemented in a real production environment, allowing its applicability to be verified. Its periodic rebalancing feature helps to avoid template drift and ensures real-time log pattern allocation.
- HELP has been extensively tested on 14 public log datasets and has been found to outperform all other state-of-the-art log parsers in log parsing and clustering accuracy. Moreover, without sacrificing speed, HELP can be modified to become a parallel batch processing framework.
In conclusion, HELP is a significant advancement in log processing technology. The capabilities of LLMs are combined with the advantages of hierarchical embeddings and iterative rebalancing to provide HELP, a scalable, reliable, and efficient solution for real-time log analysis in contemporary software systems.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our Subreddit with over 48 billion users
Find upcoming ai webinars here
Tanya Malhotra is a final year student of the University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking skills, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>