Speculative decoding is a prominent technique for accelerating the inference of a large target language model based on predictions from a preliminary auxiliary model. While effective, in specific application environments it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of subsequent tasks grows, these preliminary models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting with the target model by changing the fine-tuning objective from next token prediction to future n-gram prediction. Speculative streaming speeds up decoding by 1.8 to 3.1 times on a diverse set of tasks, such as summarization, structured queries, and meaning representation, without sacrificing generation quality. Additionally, Speculative Streaming is parameter efficient. It achieves speedups equal to or greater than Medusa-style architectures while using ~10,000x fewer additional parameters, making it well suited for resource-constrained devices.