Patronus ai has released the LYNX v1.1 series, which represents a significant advancement in artificial intelligence, particularly in detecting hallucinations in ai-generated content. Hallucinations, in the context of ai, refer to the generation of information that is unsupported by or contradictory to the data provided, posing a considerable challenge for applications that rely on accurate and reliable answers. LYNX models address this problem using Recall Augmented Generation (RAG), a method that helps ensure that ai-generated answers are faithful to the documents provided.
LYNX v1.1 70B has already demonstrated exceptional performance in this area. In the HaluBench evaluation, which tests hallucination detection in real-world situations, the 70B model achieved an impressive accuracy of 87.4%. This performance outperforms other leading models, including GPT-4o and GPT-3.5-Turbo, and has demonstrated superior accuracy on specific tasks, such as answering medical questions in PubMedQA.
LYNX v1.1 version 8B, known as Patronus-Lynx-8B-Instruct-v1.1, is a finely tuned model that balances efficiency and capacity. Trained on a diverse set of datasets including CovidQA, PubmedQA, DROP, and RAGTruth, this version supports a maximum sequence length of 128,000 tokens and primarily focuses on the English language. Advanced training techniques such as mixed precision training and flash attention are employed to improve efficiency without compromising accuracy. Evaluations were performed on 8 Nvidia H100 GPUs to ensure accurate performance metrics.
Since the release of Lynx v1.0, thousands of developers have integrated it into various real-world applications, proving its practical utility. Despite efforts to reduce hallucinations using RAG, large language models (LLMs) can still produce errors. However, Lynx v1.1 significantly improves real-time hallucination detection, making it the best-performing RAG hallucination detection model of its size. The 8B model has shown substantial improvements over baseline models such as Llama 3, scoring 87.3% on HaluBench. It outperforms models such as Claude-3.5-Sonnet by 3% and GPT-4o on medical questions by 6.8%. Furthermore, compared to Lynx v1.0, it has 1.4% higher accuracy on HaluBench and outperforms all open-source models on LLM tasks as a judge.
In conclusion, the LYNX 8B model from the LYNX v1.1 series is a robust and efficient tool for detecting hallucinations in ai-generated content. While the 70B model leads in overall accuracy, the 8B version offers a compelling balance between efficiency and performance. Its advanced training techniques, coupled with substantial performance improvements, make it an excellent choice for a variety of machine learning applications, especially where real-time hallucination detection is critical. Lynx v1.1 is open source, with open weights and data, ensuring accessibility and transparency for all users.
Review the Paper, Try it on HuggingFace Spaces, and Download Lynx v1.1 at HuggingFace. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..
Don't forget to join our Over 47,000 ML subscribers on Reddit
Find upcoming ai webinars here
Shreya Maji is a Consulting Intern at MarktechPost. She pursued her Bachelors from the Indian Institute of technology (IIT) in Bhubaneswar. She is an ai enthusiast and likes to keep herself updated with the latest developments. Shreya is particularly interested in real-world applications of cutting-edge technology, especially in the field of data science.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>