Diffusion Models (DM) have recently emerged as SoTA tools for generative modeling in various domains. Standard DMs can be viewed as an instantiation of hierarchical variational autoencoders (VAEs) where latent variables are inferred from input-centered Gaussian distributions with fixed scales and variances. Unlike VAEs, this formulation prevents DMs from changing latent spaces and learning abstract representations. In this work, we propose f-DM, a generalized family of DMs that allows progressive transformation of the signal. More precisely, we extend the DM to incorporate a set of transformations (hand-designed or learned), where the transformed input is the mean of each diffusion step. We propose a generalized formulation and derive the corresponding denoising target with a modified sampling algorithm. As a demonstration, we apply f-DM in imaging tasks with a variety of features, including downsampling, blurring, and learned transforms based on the pretrained VAE encoder. Furthermore, we identify the importance of adjusting the noise levels every time the signal is subsampled and we propose a simple rescaling recipe. f-DM can produce high-quality samples at standard imaging benchmarks such as FFHQ, AFHQ, LSUN, and ImageNet with better efficiency and semantic interpretation.