Meta AI releases CyberSecEval 3: a comprehensive evaluation framework for LLM security used in model development

The cybersecurity risks, benefits, and capabilities of ai systems are crucial to ai security and policy. As ai becomes increasingly embedded in various aspects of our lives, the possibility of malicious exploitation of these systems becomes a significant threat. Generative ai models and products are particularly susceptible to attacks due to their complex nature and reliance on large amounts of data. Developers require a comprehensive cybersecurity risk assessment that ensures the safety and reliability of ai systems, protects sensitive data, prevents system failures, and maintains public trust.

Meta ai introduces CYBERSECEVAL 3 to address the cybersecurity risks, benefits, and capabilities of ai systems, specifically focusing on large language models (LLMs) such as Llama 3 models. Previous benchmarks, CYBERSECEVAL 1 and 2, have assessed various risks associated with LLMs, including the generation of exploits and insecure code outputs. These benchmarks highlighted the models’ susceptibility to pulse injection attacks and their propensity to assist in cyberattacks. Building on CYBERSECEVAL 1 and 2, Meta ai’s CYBERSECEVAL 3 extends the assessment to new areas of offensive security capabilities. The tool measures the capabilities of Llama 3 405b, Llama 3 70b, and Llama 3 8b models in automated social engineering, escalation of manual offensive cyber operations, and autonomous cyber operations.

To evaluate the offensive cybersecurity capabilities of the Llama 3 models, the researchers conducted a series of empirical tests, including:

1. Automated social engineering via spear-phishing: Researchers simulated spear-phishing attacks using the Llama 3 405b model, comparing its performance to other models such as GPT-4 Turbo and Qwen 2-72b-instruct. The evaluation involved generating detailed victim profiles and assessing the persuasiveness of LLMs in phishing dialogues. Results showed that while Llama 3 405b could automate moderately persuasive spear-phishing attacks, it was no more effective than existing models and risks could be mitigated by implementing guardrails such as Llama Guard 3.

2. Scalability of manual offensive cyber operations: Researchers evaluated the effectiveness of Llama 3 405b in assisting cyber attackers in a “capture the flag” simulation. Participants included both experts and novices. The study found no statistically significant improvement in success rates or speed of completion of cyber attack phases using LLM compared to traditional methods such as search engines.

3. Autonomous Offensive Cyber Operations: The team tested the ability of the Llama 3 70b and 405b models to operate autonomously as hacking agents in a controlled environment. The models performed basic network reconnaissance but failed at more advanced tasks such as exploitation and post-exploitation actions. This indicated limited capabilities in autonomous cyber operations.

4. Autonomous software vulnerability discovery and exploitation: The potential of LLMs to identify and exploit software vulnerabilities was evaluated. The finding suggests that Llama 3 models did not outperform traditional tools and manual techniques in real-world scenarios. The CYBERSECEVAL 3 benchmark was based on using zero shots, but Google Naptime demonstrated that the results can be further improved by extending tools and agent scaffolding.

In conclusion, Meta ai effectively describes the challenges of assessing the cybersecurity capabilities of LLMs and presents CYBERSECEVAL 3 to address these challenges. By providing detailed assessments and publicizing their tools, the researchers offer a practical approach to understanding and mitigating the risks posed by advanced ai systems. The proposed methods show that while current LLMs, such as Llama 3, exhibit promising capabilities, their risks can be managed through well-designed guardrails.

Review the ai.meta.com/research/publications/cyberseceval-3-advancing-the-evaluation-of-cybersecurity-risks-and-capabilities-in-large-language-models/” target=”_blank” rel=”noreferrer noopener”>Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..

Don't forget to join our Over 47,000 ML subscribers on Reddit

Find upcoming ai webinars here

Pragati Jhunjhunwala is a Consulting Intern at MarktechPost. She is currently pursuing her Bachelors in technology from Indian Institute of technology (IIT) Kharagpur. She is a technology enthusiast and has a keen interest in the field of software applications and data science. She is always reading about the advancements in different fields of ai and ML.