Researchers from AI2 and the University of Washington discover the superficial nature of alignment in LLMs and introduce URIAL: a new adjustment-free method

Large Language Models (LLM) are recent innovations in the field of artificial intelligence (ai) and deep learning. Some of the well-known LLMs such as GPT, PaLM, LLaMa, etc. have shown incredible potential in content generation. From question answering and text summarization to language translation and code completion, these models can do a lot. These models, including ChatGPT, have gone through extensive pre-training on vast unsupervised text corpora. However, recent studies have suggested that the commonly adopted practice of fine-tuning may not be as essential as previously thought.

Alignment tuning, which is the process of enhancing core LLMs for use as open domain ai assistants, has been accepted as an industry standard. This includes reinforcement learning from human feedback (RLHF) and supervised fine tuning (SFT). This standard was challenged by a study called LIMA, which showed that as few as 1000 samples for SFT can be sufficient to achieve significant alignment performance.

The shallow alignment hypothesis, introduced by LIMA, proposed that adjusting alignment, rather than radically changing the basic behavior of LLMs, can train them to choose particular data formats for user engagement. This demonstrated that a few examples can produce high-quality aligned models under supervised fitting.

Since not enough research has been done to find strong support for the surface alignment theory, a team of researchers from the Allen Institute for artificial intelligence and the University of Washington has addressed the widely used alignment adjustment technique in a recent paper to do basic LLMs. into useful ai assistants for the open domain. Preference tuning has been achieved through reinforcement learning from human feedback, and instruction learning has been achieved through supervised fine tuning.

The team examined the change in token distribution between core LLMs and their aligned counterparts, such as Llama-2 and Llama-2-chat, to study the impact of alignment adjustment. They have found that basic LLMs and their aligned versions share the top-ranked tokens and perform almost identically in decoding across most token positions. Speech markers and security disclaimers are examples of style tokens that experience the most fluctuations in distribution. This study has provided compelling evidence for the hypothesis that alignment tuning primarily focuses on assimilating the linguistic style of ai assistants, and basic LLMs provide the information needed to answer users' queries.

The team also presented a research question in response to these findings: to what extent can core LLMs be aligned without SFT or RLHF? They have suggested URIAL (Untuned LLM with Redesigned In-Context Alignment), an alignment technique that does not require tuning. With just three continuous style examples and one system message, URIAL achieves effective alignment solely through in-context learning (ICL) with core LLMs.

In a series called just-eval-instruct, the team has provided a detailed and understandable analysis showing how basic LLMs with URIAL can perform on par or better than SFT-aligned LLMs (Mistral-7b-Instruct) or SFT+. RLHF (Call-2-70b-chat). Results have shown that deliberate stimulation and learning in context can dramatically close the gap between tuning-free and tuning-based alignment strategies.

In conclusion, the evaluation results have highlighted a superficial alignment adjustment and have shown that it mainly involves the adoption of linguistic styles and depends on pre-existing knowledge of the basic LLMs.

Review the Paper and Project. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.

If you like our work, you'll love our newsletter.

Tanya Malhotra is a final year student of University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with specialization in artificial intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with a burning interest in acquiring new skills, leading groups and managing work in an organized manner.

<!– ai CONTENT END 2 –>

(FREE ai WEBINAR) 'Beginner's Guide to LangChain: Chat with Your Multi-Model Data' December 11, 2023 at 10am PST

Researchers from AI2 and the University of Washington discover the superficial nature of alignment in LLMs and introduce URIAL: a new adjustment-free method

Technical Terrence Team

This FTSE 250 growth machine tops my list of stocks to watch in 2024

Leave a Reply Cancel reply

Recommended.

ETH’s path to $3000 involves holding this critical support (Ethereum Price Analysis)

SQL Query Optimization Techniques – KDnuggets

How an archeological approach can help leverage biased data in AI to improve medicine | MIT News

Analyst Predicts Bitcoin to Cross $45k in May – Technicians Say Otherwise

The Association of Latino Administrators and Superintendents (ALAS) launches sister schools program in four states and the territory of Puerto Rico

Categories

Important Links

Researchers from AI2 and the University of Washington discover the superficial nature of alignment in LLMs and introduce URIAL: a new adjustment-free method

Related

Technical Terrence Team

This FTSE 250 growth machine tops my list of stocks to watch in 2024

Leave a Reply Cancel reply

Recommended.

ETH’s path to $3000 involves holding this critical support (Ethereum Price Analysis)

SQL Query Optimization Techniques – KDnuggets

How an archeological approach can help leverage biased data in AI to improve medicine | MIT News

Analyst Predicts Bitcoin to Cross $45k in May – Technicians Say Otherwise

The Association of Latino Administrators and Superintendents (ALAS) launches sister schools program in four states and the territory of Puerto Rico

Categories

Important Links

Get daily news updates to your inbox!