One of the most interesting applications of large language models (LLMs) is medicine, and some of its use cases include medical research, personalized health plans, clinical diagnosis, and many more. However, given how security-critical the field is, these models need to be tested in various use cases to ensure they are safe to use. Furthermore, these models should be made public to allow scrutiny.
Therefore, a group of researchers has published a set of LLMs called Meditron which are domain-adapted and based on LLaMA-2. The model has two variants: one with 7B parameters and another with 70B. MediTron is a fundamental model that can be used for specific downstream tasks using RLHF or instruction tuning, and some of its use cases include answering medical exam questions, general health queries, disease information queries, and supporting differential diagnoses. .
The MediTron training dataset is quite comprehensive and consists of clinical practice guidelines, medical articles along with their summaries, and domain-general pre-training data. The Megatron-LLM distributed training library has been used to optimize training efficiency, and the parallelization scheme uses data, pipeline, and tensor parallelism to speed up the process.
The researchers made an initial evaluation of the veracity of the models by comparing it with the reference models.
They used the TruthfulQA dataset as a benchmark and performed single evaluations for model 7B and zero evaluations for model 70B. Both models were able to perform better than the others, with an average score of 71.2 for MediTron-70B compared to 54.8 for LLaMA-2-70B, and 28.3 for MediTron-7B compared to 12.6 for LLaMA -2-7B.
For further evaluation, the researchers used various testing benchmarks such as MedQA, PubMedQA, etc., and calculated the accuracy of the multiple-choice question answering tasks. To compare the results, they also used different LLMs, such as LLaMA-7B, LLaMA-70B, Mistral-7B-instruct, etc. The results show that MediTron-7B and MediTron-70B outperformed their competitors in almost all data sets, showing their superior capabilities.
Although the model has been trained on a large medical data set and performs well on multiple benchmarks, users should be aware of its limitations and should not implement it in medical applications without additional testing. Researchers have only begun to understand the capabilities and limitations of the model and have therefore cautioned against its use in medical systems for the time being.
In conclusion, MediTron is a set of domain-specific LLMs that have been trained on a wide range of medical data sets. It has two variants, one with 7B parameters and another with 70B, and both achieved better performance than the other models considered for evaluation. The researchers also mentioned that the model should not be implemented without additional training, given how critical the field is. Overall, the model is an interesting development in medicine and has the potential to solve a variety of medical tasks and help medical professionals.
Review the Paper, Model 7B, and Model 70B. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.
<!– ai CONTENT END 2 –>