Have you ever been asked a question to which you only knew part of the answer? To give a more informed answer, it might be best to call a friend who is more knowledgeable on the subject.
This collaborative process can also help large language models (LLMs) improve their accuracy. Still, it’s been difficult to teach them to recognize when they should collaborate with another model to get an answer. Rather than using complex formulas or large amounts of labeled data to tell when models should work together, researchers at MIT’s Computer Science and artificial intelligence Laboratory (CSAIL) have envisioned a more organic approach.
Their new algorithm, called “Co-LLM,” can pair a basic, general-purpose LLM with a more specialized model and help them work together. While the former is constructing an answer, Co-LLM looks at each word (or token) within its answer to see where it can draw on a more accurate answer from the expert model. This process leads to more accurate answers to things like medical indications and math and reasoning problems. Since the expert model isn’t needed at every iteration, this also leads to more efficient answer generation.
To decide when a base model needs help from an expert model, the framework uses machine learning to train a “switch variable,” or a tool that can tell you how proficient each word is within the two LLMs’ responses. The switch variable is like a project manager, finding areas where you should call in a specialist. If you were to ask Co-LLM to name some examples of extinct bear species, for example, two models would compose responses together. The general-purpose LLM starts putting together a response, with the switch variable stepping in at the parts where it can insert a better token from the expert model, such as adding the year the bear species went extinct.
“With Co-LLM, we are essentially training a general-purpose LLM to ‘call’ an expert model when needed,” says Shannon Shen, an MIT PhD student in electrical engineering and computer science and affiliated with CSAIL, who is lead author of a New article on the approach“We use domain-specific data to teach the base model about its counterpart’s expertise in areas such as biomedical tasks and math and reasoning questions. This process automatically finds the parts of the data that are difficult for the base model to generate and then instructs the base model to switch to the expert LLM, which was pre-trained on data from a similar field. The general-purpose model provides the “scaffolding” generation, and when it turns to the specialized LLM, it instructs the expert to generate the desired tokens. Our findings indicate that LLMs learn collaboration patterns organically, similar to how humans recognize when to turn to an expert to fill in the blanks.”
A combination of flexibility and feasibility
Let's imagine that we ask a general practitioner to name the ingredients of a specific prescription drug. He or she might answer incorrectly, which would require the expertise of a specialized model.
To demonstrate the flexibility of the Co-LLM, the researchers used data such as BioASQ medical suite to combine a foundation LLM with expert LLMs in different domains, such as ai.meta.com/blog/llama-2-3-meditron-yale-medicine-epfl-open-source-llm/”>Meditron modelwhich is pre-trained on unlabeled medical data. This allowed the algorithm to help answer questions that a biomedical expert would normally receive, such as naming the mechanisms that cause a particular disease.
For example, if you ask a simple LLM to name the ingredients of a specific drug, it might answer incorrectly. With the added expertise of a model that specializes in biomedical data, you'll get a more accurate answer. Co-LLM also alerts users on where to check answers.
Another example of Co-LLM's performance boost: When tasked with solving a math problem like “a3 a2 if a=5,” the general-purpose model incorrectly calculated the answer to be 125. As Co-LLM trained the model to collaborate more with a large math LLM called FlameTogether they determined that the correct solution was 3.125.
The co-LLM provided more accurate answers than both simple, fine-tuned LLMs and specialized, unadjusted models working independently. The co-LLM can guide two differently trained models to work together, while other effective collaborative LLM approaches, such as “Proxy setting,“They require all their component models to be trained in a similar manner. Furthermore, this baseline requires each model to be used simultaneously to produce the answer, whereas the MIT algorithm simply activates its expert model for particular tokens, leading to more efficient generation.
When to ask the expert
The MIT researchers’ algorithm highlights that more closely mimicking human teamwork can increase the accuracy of collaboration between multiple LLMs. To further increase its factual accuracy, the team can turn to human self-correction: They are considering a more robust deferral approach that can back off when the expert model fails to provide a correct answer. This update would allow Co-LLM to course-correct so that the algorithm can continue to provide a satisfactory answer.
The team would also like to update the expert model (by training the base model exclusively) when new information becomes available, to keep the answers as fresh as possible. This would allow Co-LLM to combine the most up-to-date information with strong reasoning power. Over time, the model could help with business documents, using the latest information it has to update them accordingly. Co-LLM could also train small, private models to work with a more powerful LLM to improve documents that need to remain within the server.
“Co-LLM presents an interesting approach to learning to choose between two models to improve efficiency and performance,” says Colin Raffel, an associate professor at the University of Toronto and associate research director at the Vector Institute, who was not involved in the research. “Since routing decisions are made at the token level, Co-LLM provides a granular way to defer difficult generation steps to a more powerful model. The unique combination of model-token level routing also provides a great deal of flexibility that similar methods lack. Co-LLM contributes to an important line of work that aims to develop specialized model ecosystems to outperform expensive monolithic ai systems.”
Shen wrote the paper with four other CSAIL affiliates: PhD student Hunter Lang ’17, MEng ’18; former Apple postdoc and ai/ML researcher Bailin Wang; MIT associate professor of electrical engineering and computer science Yoon Kim; and professor and Clinic Fellow Jameel David Sontag PhD ’10, all of whom are part of the MIT-IBM Watson ai Lab. Their research was supported, in part, by the National Science Foundation, the National Defense Science and Engineering Graduate Fellowship (NDSEG), the MIT-IBM Watson ai Lab, and amazon. Their work was presented at the Association for Computational Linguistics Annual Meeting.