In an unprecedented move encouraging accountability in the rapidly evolving generative ai (GenAI) space, Vectara has launched an open-source hallucination assessment model, marking a significant step toward standardizing the measurement of factual accuracy in large language models (LLM). This initiative establishes a commercial, open-source resource for measuring the degree of “hallucination” or divergence from verifiable facts by LLMs, along with a dynamic, publicly available leaderboard.
The statement aims to strengthen transparency and provide an objective method for quantifying hallucination risks in leading GenAI tools, an essential measure to promote responsible ai, mitigate misinformation, and support effective regulation. The Hallucination Assessment Model will be a critical tool to assess the extent to which LLMs are fact-based by generating content based on the reference material provided.
Vectara’s hallucination assessment model, now accessible on Hugging Face under an Apache 2.0 license, offers a clear window into the factual integrity of the LLMs. Prior to this, claims by LLM providers about their models’ resistance to hallucinations remained largely unverifiable. Vectara’s model uses the latest advances in hallucination research to objectively evaluate LLM summaries.
Accompanying the launch is a Leaderboard, similar to a FICO score for GenAI accuracy, maintained by the Vectara team in collaboration with the open source community. It ranks LLMs based on their performance on a standardized set of prompts, providing companies and developers with valuable information for informed decision making.
The leaderboard results indicate that OpenAI models currently lead in performance, closely followed by Llama 2 models, with Cohere and Anthropic also showing good results. Google’s Palm models, however, scored lower, reflecting continued evolution and competition in the field.
While not a solution to hallucinations, Vectara’s model is a critical tool for safer and more accurate GenAI adoption. Its introduction comes at a critical time, with increased attention to the risks of misinformation approaching major events such as the US presidential election.
The hallucination assessment model and classification table are poised to be instrumental in fostering a data-driven approach to GenAI regulation, offering a standardized benchmark long awaited by both industry and regulatory bodies.
Review the Model and Rating Page. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 32k+ ML SubReddit, Facebook community of more than 40,000 people, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
we are also in Telegram and WhatsApp.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.
<!– ai CONTENT END 2 –>