Some tasks that require the creation or verification of factual statements (such as question answering, fact checking, and even unconditional text generation) are handled relatively successfully in current language models (LMs). However, there is increasing evidence showing that LMs become more likely to produce erroneous but often repeated comments as size increases. They are far from completely reliable. The fact that LMs have several possibilities for solving fact generation tasks further complicates things.
They can be used both generatively (asking for the most likely answer to a question) and discriminatively (presenting a pair of questions and answers and asking whether the answer is acceptable), but these two methods sometimes produce different results. Generative methods can fail when the probability mass is distributed over multiple conflicting answers, while discriminative methods can fail due to poor calibration or subtle question dependence. How should they extract a LM’s best estimate of the truth from these chaotic and often contradictory signals? The CONSENSUS GAME, MIT researchers use a signaling game in this research to offer a method for linking generative and discriminative LM decoding processes.
A DISCRIMINATOR agent must transmit a correct or incorrect abstract value to a high-level GENERATOR agent. Still, you can only do this using a limited number of possible natural language strings. It seems reasonable that a combined policy, in which the GENERATOR and DISCRIMINATOR agree on the assignment of strings to correctness values, would be a successful approach for this game. They can examine an approach like that to find candidates that everyone thinks are right. To achieve this, a multi-step game with a difficult action space (valued in strings) must be solved. Regret-free learning algorithms have recently become popular as a reference method for calculating winning tactics in games such as Poker, Stratego, and Diplomacy.
Here they demonstrate that they can also be used for tasks that involve creating free-form languages. This game theory method of decoding LM is known as EQUILIBRIUM-RANKING. When used on 6 benchmarks for question answering performance (MMLU, ARC, RACE, HHH, TruthfulQA and GSM8K), EQUILIBRIUM-RANKING significantly outperforms currently used generative, discriminative and mixed decoding techniques. In a broader sense, their findings demonstrate how the game theory toolkit can be used to formalize and improve coherence in LMs. The accuracy of factual tasks also improves as a result of greater consistency.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Data Science and artificial intelligence at the Indian Institute of technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around it. She loves connecting with people and collaborating on interesting projects.
<!– ai CONTENT END 2 –>