The inherent capabilities of large pre-trained language models are notable, but achieving the desired behaviors often requires additional adaptation. When it comes to models whose weights are kept private, the challenge intensifies, making fitting excessively expensive or downright impossible. As a result, striking the right balance between customization and resource efficiency remains a persistent concern when optimizing the performance of these advanced language models.
Despite the increasing versatility of large pre-trained language models, they predominantly benefit from additional tuning to improve specific behaviors. Tuning has become more resource-intensive, which poses challenges, especially when it comes to private model weights, such as OpenAI's GPT-4 in 2023. Consequently, efficiently customizing increasingly broader language models for diverse user and application needs remains a major challenge.
Researchers from the University of Washington and the Allen Institute for ai present proxy setting, a decoding time algorithm designed to fit large black-box (LM) language models without accessing their internal weights. This method takes advantage of a smaller tuned LM and calculates the difference between its predictions and the untuned version. Using decoding time experts, the original predictions of the larger base model are adjusted based on this difference, effectively achieving the benefits of forward fitting.
Proxy tuning aims to close the disparity between a base language model and its directly fitted version without altering the parameters of the base model. This approach includes fitting a smaller LM and using the contrast between its predictions and the unadjusted version to adjust the original predictions of the base model toward the direction of fit. Importantly, proxy tuning preserves the benefits of extensive pre-training while effectively achieving the desired behaviors in the language model.
Base models need help with AlpacaFarm and GSM questions, achieving low gain rates and accuracy. Proxy tuning significantly improves performance, reaching 88.0% on AlpacaFarm and 32.0% on GSM for 70B-BASE. In Toxigen, the proxy setting reduces toxicity to 0%. TruthfulQA's open configuration makes proxy tuning outperform CHAT models in truthfulness. In different scenarios, proxy tuning closes 91.1% of the performance gap at 13B scale and 88.1% at 70B scale, demonstrating its effectiveness in improving model performance without direct fine-tuning.
In summary, researchers at the University of Washington and the Allen Institute for ai have proposed proxy setting, which emerges as a promising approach to fine-tune large language models at decoding time by modifying the output logits. It is an effective alternative to traditional tuning, making large language models more accessible, especially for those with limited resources. The method also addresses the challenge of adapting proprietary models to diverse use cases. The conclusion invites model producing organizations to share production probabilities for wider use.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook community, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our Telegram channel
Asjad is an internal consultant at Marktechpost. He is pursuing B.tech in Mechanical Engineering at Indian Institute of technology, Kharagpur. Asjad is a machine learning and deep learning enthusiast who is always researching applications of machine learning in healthcare.
<!– ai CONTENT END 2 –>