How to add Llama Guard to your RAG pipelines to moderate LLM inputs and outputs and combat fast injection
LLM security is an area that we all know deserves extensive attention. Organizations eager to adopt generative ai, from large to small, face a major challenge in securing their LLM applications. How to combat rapid injection, handle unsafe results, and avoid the disclosure of sensitive information are pressing questions that every ai architect and engineer must answer. Enterprise production-grade LLM applications cannot survive in the wild without robust solutions to address LLM security.
Llama Guard, open sourced from Meta on December 7, 2023, offers a viable solution to address LLM input and output vulnerabilities and combat fast injection. Llama Guard is included in the overall project ai-development/” rel=”noopener ugc nofollow” target=”_blank”>purple flame“which includes open trust and security assessments and tools aimed at leveling the playing field for developers to deploy generative ai models responsibly.”(1)
We explore OWASP Top 10 for LLM Applications one month ago. With Llama Guard, we now have a pretty reasonable solution to start addressing some of those top 10 vulnerabilities, namely:
- LLM01: Immediate injection
- LLM02: Unsafe exit handling
- LLM06: Disclosure of sensitive information
In this article, we will explore how to add Llama Guard to a RAG pipeline to:
- Moderate user input
- Moderate LLM results
- Experiment with customizing the out-of-the-box unsafe categories to suit your use case.
- Combat rapid injection attempts.
Flame Guard “is a 7B parameter Call 2Safeguard model based on input-output. It can be used to classify content in both LLM entries (quick sort) and LLM responses (response sort). It acts like an LLM: it generates text in its output that indicates whether a given message or response is safe/insecure and, if it is not safe according to a policy, it also lists the offending subcategories.”(2)