Large language models (LLMs) such as ChatGPT and GPT-4 have made significant advances in ai research, outperforming previous state-of-the-art methods on several benchmarks. These models show great potential in the healthcare domain, offering advanced tools to improve efficiency through natural language understanding and response. However, the integration of LLMs into biomedical and healthcare applications faces a critical challenge: their vulnerability to malicious manipulation. Even commercially available LLMs with built-in safeguards can be tricked into generating harmful results. This susceptibility poses significant risks, especially in medical settings where the stakes are high. The problem is further exacerbated by the possibility of data poisoning during model fine-tuning, which can lead to subtle alterations in LLM behavior that are difficult to detect under normal circumstances but manifest when triggered by specific inputs.
Previous research has explored the manipulation of LLMs in general domains, demonstrating the possibility of influencing model outputs to favor specific terms or recommendations. These studies have typically focused on simple scenarios involving single trigger words, resulting in consistent alterations in model responses. However, these approaches often oversimplify real-world conditions, particularly in complex medical settings. The applicability of these manipulation techniques to healthcare settings remains uncertain, as the complexities and nuances of medical information pose unique challenges. Furthermore, the research community has yet to thoroughly investigate behavioral differences between clean and poisoned models, leaving a significant gap in understanding their respective vulnerabilities. This lack of comprehensive analysis hampers the development of effective safeguards against potential attacks on LLMs in critical domains such as healthcare.
In this work, researchers from the National Center for Biotechnology Information (NCBI), the National Library of Medicine (NLM), and the University of Maryland at College Park, Department of Computer Science, aim to investigate two modes of adversarial attacks on three medical tasks, focusing on fine-tuning and indication-based methods for attacking standard LLMs. The study uses real-world patient data from the MIMIC-III and PMC-Patients databases to generate standard and adversarial responses. The research examines the behavior of LLMs, including the proprietary GPT-3.5-turbo and the open-source Llama2-7b, on three representative medical tasks: COVID-19 vaccination guidance, medication prescription, and diagnostic test recommendations. The goals of the attacks on these tasks are to discourage vaccination, suggest harmful drug combinations, and promote unnecessary medical tests. The study also evaluates the transferability of attack models trained on MIMIC-III data to real-life patient summaries from PMC-Patients, providing a comprehensive analysis of LLM vulnerabilities in healthcare settings.
Experimental results reveal significant vulnerabilities in LLMs to adversarial attacks through prompt manipulation and fine-tuning the model with poisoned training data. Using the MIMIC-III and PMC-Patients datasets, the researchers observed substantial changes in model results on three medical tasks when subjected to these attacks. For example, under prompt-based attacks, vaccine recommendations dropped dramatically from 74.13% to 2.49%, while recommendations for dangerous drug combinations increased from 0.50% to 80.60%. Similar trends were observed for recommendations for unnecessary diagnostic tests.
The fine-tuned models showed comparable vulnerabilities, with both GPT-3.5-turbo and Llama2-7b exhibiting significant shifts toward malicious behavior when trained on adversarial data. The study also demonstrated the transferability of these attacks across different data sources. Notably, GPT-3.5-turbo showed increased resilience to adversarial attacks compared to Llama2-7b, possibly due to its extensive background knowledge. The researchers found that the effectiveness of the attacks generally increased with the proportion of adversarial samples in the training data, reaching saturation points at different levels for various tasks and models.
This research provides a comprehensive analysis of the vulnerabilities of LLM models to adversarial attacks in medical contexts, showing that both open-source and commercial models are susceptible. The study reveals that while adversarial data does not significantly impact a model’s overall performance on medical tasks, complex scenarios require a higher concentration of adversarial samples to achieve attack saturation compared to domain-general tasks. The distinctive weight patterns observed in fine-tuned poisoned models compared to clean models offer a potential avenue for developing defensive strategies. These findings underscore the critical need for advanced security protocols in LLM deployment, especially as these models are increasingly integrated into healthcare automation processes. The research highlights the importance of implementing strong safeguards to ensure the safe and effective application of LLM models in critical sectors such as healthcare, where the consequences of manipulated results could be severe.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter.
Join our Telegram Channel and LinkedIn GrAbove!.
If you like our work, you will love our Newsletter..
Don't forget to join our Subreddit with over 46 billion users
Asjad is a consultant intern at Marktechpost. He is pursuing Bachelors in Mechanical Engineering from Indian Institute of technology, Kharagpur. Asjad is a Machine Learning and Deep Learning enthusiast who is always researching the applications of Machine Learning in the healthcare domain.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>