LLMs such as GPT-4, MedPaLM-2, and Med-Gemini perform well on medical parameters but need help to replicate the diagnostic skills of physicians. Unlike physicians who gather information from patients using structured questions and examinations, LLMs often need greater logical coherence and specialized knowledge, leading to inadequate diagnostic reasoning. While they can assist in initial assessments by leveraging medical corpora, their responses may be inconsistent and not compliant with professional guidelines, particularly in complex or specialized cases. This gap highlights their limitations in providing reliable medical diagnoses.
Researchers from Zhejiang University and Ant Group have introduced the RuleAlign framework, which aims to align LLMs with specific diagnostic rules to improve their effectiveness as ai physicians. They developed a medical dialogue dataset, UrologyRD, focused on rule-based urological interactions. Using preference learning, the model is trained to ensure that its responses follow established protocols without the need for additional human annotations. Experimental results show that RuleAlign improves the performance of LLMs in both single-round and multi-round evaluations, demonstrating its potential in medical diagnosis.
Medical LLMs are rapidly advancing in academia and industry, with efforts focused on integrating medical data into general LLMs through supervised fine-tuning (SFT). Notable examples include MedPaLM-2, Med-Gemini, and Chinese models such as DoctorGLM and HuatuoGPT-II. These models often use specialized datasets, such as BianQueCorpus, to balance question-asking and advice-giving capabilities. LLMs are optimized through preference learning and reward models to improve model alignment approaches such as RLHF and DPO. Techniques such as SLiC and SPIN refine alignment by combining loss functions, data augmentation, and iterative training.
To create the UrologyRD dataset, researchers first collected detailed diagnostic rules by summarizing relevant medical conversations and extracting key guidelines. These rules focus on urology and specify disease-related constraints and essential evidence for diagnosis. The dataset was generated by mapping disease names to broader categories and tailoring dialogues using these rules. To align LLMs with human goals, the RuleAlign framework employs preference learning. It optimizes LLM results by training with rule-based dialogues, distinguishing preferred and dispreferred responses, and refining through semantic similarity and altering dialogue order to improve diagnostic accuracy.
Single-round and multi-round tests are used to evaluate the performance in evaluating LLM for medical diagnosis. Metrics such as perplexity, ROUGE, and BLEU are applied in single-round tests. At the same time, SP tests evaluate the models for information completeness, guidance rationality, diagnostic logic, clinical applicability, and treatment logic. RuleAlign demonstrates superior performance, improving ROUGE and BLEU scores and reducing perplexity. It efficiently aligns LLM responses with diagnostic rules, although it sometimes struggles with hallucinations and logical consistency. Method optimization strategies, including semantic similarity and order shuffling, significantly improve the model's accuracy and consistency in generating medical dialogs.
In conclusion, the study presents UrologyRD, a diagnostic rule-based medical dialogue dataset, and proposes RuleAlign, an innovative method for automatic preference pair synthesis and alignment. Experiments demonstrate the effectiveness of RuleAlign in various evaluation settings. Despite advances in LLMs such as GPT-4, MedPaLM-2, and Med-Gemini, which perform competitively with human experts, there are still challenges in their diagnostic capabilities, especially information collection and reasoning from hospitalized patients. RuleAlign aims to address these issues by aligning LLMs with diagnostic rules, which will potentially advance research in ai-driven medical applications and enhance the role of LLMs as ai physicians.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and LinkedInJoin our Telegram Channel.
If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
Sana Hassan, a Consulting Intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and ai to address real-world challenges. With a keen interest in solving practical problems, she brings a fresh perspective to the intersection of ai and real-life solutions.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>