Large language models are trained on massive chunks of the web, which are often unstructured, noisy, and poorly written. Current scaling laws show that learning from such data is computationally and data-intensive, which grows with the size of the model being trained. This is infeasible due to the large computational costs and duration associated with pretraining, and the looming shortage of high-quality data on the web. In this work, we propose Web Reformulation Augmented Pretraining (WRAP) that uses an off-the-shelf model tuned by instructions that it is asked to paraphrase documents on the web in specific styles such as “Wikipedia-like” or in “question-answer format” to jointly pretrain large language models on real and synthetic reformulations. We first show that using WRAP on the naturally noisy C4 dataset speeds up pretraining by about 3x. Given the same pre-training computational budget, it improves perplexity by over 10% on average across different subsets of the Pile, and improves zero-shot question answering accuracy across 13 tasks by over 2%. Second, we investigate the impact of reformulation style on model performance, providing insights into how the composition of the training data can affect the performance of LLMs in OOD settings. Our gains are attributed to the fact that reformulated synthetic data has higher utility than real data because (i) it incorporates style diversity that closely mirrors the post-evaluation style, and (ii) it has higher “quality” than data scraped from the web.