Stanford Researchers Present SequenceMatch: LLM Training with an Imitation Learning Loss

Autoregressive models are a class of statistical models based on the intuition that the current value of a variable is highly dependent on its past values. In other words, the model predicts the future value of a variable by running a regression on its past values. One of the best-known examples of autoregressive models is the class of GPT models, especially GPT-3 and its variants, which rely heavily on predicting the next word in a sequence given previous words. By training GPT in this autoregressive manner on a large corpus of text, it learns to capture the statistical patterns, dependencies, and semantic relationships in the language, allowing it to generate contextually relevant text based on the input hint. However, previous research experiments have shown that smaller models or models that are tuned to have less randomness or variability (ie, lower generation temperatures) tend to produce repetitive or erroneous results. Also, in certain scenarios, these models use their own outputs as inputs, often leading to compound errors that quickly push the model out of its predicted distribution.

To overcome these challenges, a team of Stanford researchers conducted initial studies and identified two major obstacles that prevent autoregressive models trained with maximum likelihood estimation (MLE) from generating consistent sequences during testing. The first problem lies in the measure of divergence used to assess the disparity between the model and the data distribution. Since MLE does not take out-of-distribution (OOD) sequences into account, the behavior of the model in such sequences cannot be controlled. To address this, the researchers came up with the idea of minimizing the χ2 divergence between a pool of real data and the autoregressively generated sequences, which has shown superior performance compared to MLE. The second challenge arises when the model produces an OOD token without a proper continuation that is aligned with the data distribution. To address this, the researchers introduce an action of in the generation process, allowing the model to clear the old token and rectify any mistakes it may have made.

Drawing on these learnings from their preliminary studies, the Stanford researchers have devised a novel method called SequenceMatch, which enables the training of autoregressive models against difference-divergence techniques while adding a matching action. which allows the model to correct errors. The researchers recast the sequence generation problem as a reinforcement learning problem that, in simple terms, can be summarized as choosing the next course of action (which, in this case, generates the next token) among all possible sequences. for a given state (ie, a partial sequence). Therefore, by using the latest developments in non-confrontational imitation learning, which is a framework within the field of reinforcement learning, the researchers were able to reduce the divergence between the occupancy measures of a trained model and the distribution of the data. real. In addition, to further minimize composition error in sequence generation, the autoregressive model was trained with an action of , unlike MLE, to facilitate backtracking by allowing the model to remove tokens. This fully supervised loss technique for language modeling, SequenceMatch, can be used as an additional step to fit pretrained models.

🔥 Unleash the power of Live Proxies: private and undetectable residential and mobile IPs.

The researchers performed several experimental evaluations to compare the performance of GPT-2-based models fitted in SequenceMatch with models trained in MLE. Using the MAUVE score as a metric to compare performance, the researchers revealed that models fitted in SequenceMatch generated text that was closer to the data set and appeared smoother and more error-free in contrast to models trained in MLE. The team also highlighted the limitation of their model, as it requires more computational resources and time to generate long texts. As far as future work is concerned, the researchers are focused on studying how different divergence methods affect the quality of the generated sequences.

review the Paper. Don’t forget to join our 25k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.