Magpie-ultraA new dataset from the Argilla team has been released for supervised fine-tuning, which includes 50,000 instruction-response pairs. This synthetically generated dataset uses the advanced Llama 3.1 405B-Instruct model and other Llama models such as Llama-Guard-3-8B and Meta-Llama-3.1-8B-Instruct. The dataset covers various tasks such as coding, mathematics, data analysis, creative writing, advice seeking, and brainstorming, and offers challenging instructions and responses to improve ai model training.
This dataset is built using distilabel and its creation follows the Magpie recipe as described in the paper “Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing”. This iteration differs from the original Magpie by employing the new Llama 3.1 family of models and generating a more specific set of 50,000 instruction-response pairs, compared to the previous 1 million. The process uses multiple models for instruction generation, response creation, quality assessment, and security classification.
The generation process involved a single 8xH100 machine, and creating the instruction-response pair took approximately 60 hours. Additional steps, such as generating responses using the base model, computing embeddings, evaluating quality and difficulty, and ranking instructions, required approximately 51 hours in total. This efficient process resulted in a complete dataset with multiple data points for each input.
The dataset structure includes several columns that provide detailed information about each instruction-response pair. Key columns include the instruction itself, responses from the instruction and base models, intent, knowledge required, difficulty level, quality rating, and category classification. Additionally, the dataset incorporates security controls using Llama-Guard-3-8B and provides integration information for each instruction.
One of the strengths of the dataset lies in its potential applications. It can be used for supervised fine-tuning (SFT) or direct preference optimization (DPO), based on the difference in scores between the base model’s responses and those of the training model. This flexibility allows researchers and developers to tailor the dataset to their specific needs in training and optimizing ai models.
While this release marks a significant advancement in ai training data, it is important to note its limitations. This release is unfiltered, and a filtered version is planned for release in the future. Additionally, the dataset may need to be more balanced, an issue that will be addressed in future iterations. Despite these limitations, Magpie-ultra represents a valuable resource for improving ai capabilities across multiple domains.
Review the Pipeline and Data set. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..
Don't forget to join our Over 47,000 ML subscribers on Reddit
Find upcoming ai webinars here
Asjad is a consultant intern at Marktechpost. He is pursuing Bachelors in Mechanical Engineering from Indian Institute of technology, Kharagpur. Asjad is a Machine Learning and Deep Learning enthusiast who is always researching the applications of Machine Learning in the healthcare domain.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>