Hugging Face launches benchmark to test generative AI in health tasks

Generative ai models are increasingly being incorporated into healthcare environments, in some cases perhaps prematurely. Early adopters believe they will unlock greater efficiency while revealing insights that would otherwise be overlooked. Meanwhile, critics point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to tell how useful or harmful a model might be when tasked with tasks like summarizing patient records or answering health-related questions?

Hugging Face, the ai startup, proposes a solution in a recently launched benchmark test called Open Medical-LLM. Created in partnership with researchers from the non-profit organization Open Life Science ai and the Natural Language Processing Group at the University of Edinburgh, Open Medical-LLM aims to standardize the performance evaluation of generative ai models in a variety of tasks related to medicine.

New: Medical LLM Open Leaderboard!

In basic chatbots, errors are annoyances.
In medical LLMs, errors can have life-threatening consequences

Therefore, it is vital to compare/track advancements in medical LLMs before thinking about implementation.

Blog: https://t.co/pddLtkmhsz

– Clementina Fourrier (@clefourrier) twitter.com/clefourrier/status/1780943086694330637?ref_src=twsrc%5Etfw”>April 18, 2024

Open Medical-LLM is not a right from the start benchmark, per se, but rather a union of existing test suites (MedQA, PubMedQA, MedMCQA, etc.) designed to test models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics, and clinical practice. The benchmark contains open-ended and multiple-choice questions that require medical reasoning and understanding, drawing on material including US and Indian medical licensing exams and question banks from college biology exams.

“(Open Medical-LLM) allows researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advances in the field, and ultimately contribute to better patient care and outcomes,” Hugging Face wrote in a post of blog.

Image credits: hugging face

Hugging Face is positioning the benchmark as a “robust evaluation” of generative ai models intended for healthcare. But some medical experts on social media warned against making too much of Open Medical-LLM, lest misinformed deployments occur.

In current clinical practice can be quite broad.

It's great progress to see these head-to-head comparisons, but it's important that we also remember how big the gap is between the artificial environment of answering medical questions and actual clinical practice. Not to mention the idiosyncratic risks that these metrics cannot capture.

– Liam McCoy, MD MSc (@LiamGMcCoy) twitter.com/LiamGMcCoy/status/1780952462821863715?ref_src=twsrc%5Etfw”>April 18, 2024

Hugging Face research scientist Clémentine Fourrier, co-author of the blog post, agreed.

“These league tables should only be used as a first approximation of which (generative ai model) to explore for a given use case, but then a deeper phase of testing is always needed to examine the limits and relevance of the model under conditions.” real”. twitter.com/clefourrier/status/1780955155300745247″ target=”_blank” rel=”noopener”>Fourier responded at

This is reminiscent of Google's experience when it tried to bring an ai-powered diabetic retinopathy screening tool to health systems in Thailand.

Google created a deep learning system that scans images of the eye for evidence of retinopathy, a leading cause of vision loss. But despite the high theoretical precision, technology/health/healthcare-ai-systems-put-people-center/” target=”_blank” rel=”noopener” data-mrf-link=”https://www.blog.google/technology/health/healthcare-ai-systems-put-people-center/”>the tool proved impractical in real-world testsfrustrating both patients and nurses with inconsistent results and a general lack of harmony with practices on the ground.

Tellingly, of the 139 ai-related medical devices the US Food and Drug Administration has approved to date, none use generative ai. It is exceptionally difficult to test how the performance of a generative ai tool in the lab will translate to hospitals and outpatient clinics and, perhaps more importantly, how the results might evolve over time.

That's not to say that Open Medical-LLM isn't useful or informative. The results leaderboard, at the very least, serves as a reminder of how evil The models answer basic health questions. But Open Medical-LLM, and no other benchmark, is a substitute for carefully thought-out real-world testing.

Tags: AI benchmark Embrace face Face generative generative AI health Healthcare Hugging Launches Medicine tasks test

Hugging Face launches benchmark to test generative AI in health tasks

Technical Terrence Team

Structured generative AI. How to restrict your model to output… | by Oren Matar | April 2024

Leave a Reply Cancel reply

Recommended.

Royal Mint abandons NFT project amid consultations with UK government

XRP and TON receive green light from Dubai International Financial Center (DIFC) for legal use

The Emerging Perpetual Exchange on Fantom – Press Release Bitcoin News

Crypto Exchange Bitfinex Transfers $8.5 Million to Alameda Consolidation Management

Google Photos will soon give you more say in featured videos created by AI

Categories

Important Links

Hugging Face launches benchmark to test generative AI in health tasks

Related

Technical Terrence Team

Structured generative AI. How to restrict your model to output… | by Oren Matar | April 2024

Leave a Reply Cancel reply

Recommended.

Royal Mint abandons NFT project amid consultations with UK government

XRP and TON receive green light from Dubai International Financial Center (DIFC) for legal use

The Emerging Perpetual Exchange on Fantom – Press Release Bitcoin News

Crypto Exchange Bitfinex Transfers $8.5 Million to Alameda Consolidation Management

Google Photos will soon give you more say in featured videos created by AI

Categories

Important Links

Get daily news updates to your inbox!