during january Microsoft Research ForumDipendra Misra, senior researcher at Microsoft Research Lab NYC and ai Frontiers, explained how layer-selective range reduction (or LASER) can make large language models more accurate.
With LASER, researchers can “intervene” and replace one weight matrix with another approximately smaller one. The weights are the contextual connections that the models make. The higher the weight, the more it depends on the model. So does replacing something with more correlations and contexts make the model less accurate? Based on their test results, the answer, surprisingly, is no.
“We are doing an intervention using LASER on the LLM, so one would expect the loss of the model to increase as we do more approximations, which means that the model will perform poorly, right, because we are throwing away information from an LLM. , which is trained with large amounts of data,” Misra said. “But to our surprise, we found that if the right type of LASER intervention is performed, the model loss does not increase but actually decreases.”
Misra said his team successfully used LASER on three different open source models: RoBERTa, Llama 2, and Eleuther's GPT-J. He said the model's improvement sometimes increased by 20 to 30 percentage points. For example, GPT-J's performance for bio-based gender prediction rose from 70.9 percent accuracy to 97.5 percent after a LASER intervention.