Controllable music production with diffusion models and orientation gradients

This paper was accepted into the NeurIPS 2023 workshop on diffusion models.

We demonstrate how conditional generation from diffusion models can be used to address a variety of realistic tasks in music production in 44.1 kHz stereo audio with sampling time guidance. The scenarios we consider include continuing, painting and regenerating musical audio, creating smooth transitions between two different musical tracks, and transferring desired stylistic features to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that supports both reconstruction and classification losses, or any combination of both. This approach ensures that the generated audio can match the surrounding context or fit a specified class distribution or latent representation relative to any suitable pre-trained classifier or embedding model.

In Table 1 we show randomly chosen samples for a series of creative applications, each conditioned on a given audio message. For each task and indication we show samples of the different models described in the article.

Types of tasks:

padding: replaces the middle two seconds of the message
regeneration: regenerates the middle two seconds of the message
continuation: generates a new continuation from the first 2.4 seconds of the message
transitions: regenerates a merged section between two tracks
Guide: generate a new clip conditional on the Fits message classifier embedding

immediate	task	CQTDiff (initial value)	latent	waveform
	stuffed
	stuffed
	stuffed
	regenerate
	regenerate
	regenerate
	continuation
	continuation
	continuation
	transitions
	transitions
	transitions
	guide
	guide
	guide

The indications are taken from a test division of the Free Music File Dataset, published by Michaël Defferrard et al. under Creative Commons Attribution 4.0 International License (CC BY 4.0).

Controllable music production with diffusion models and orientation gradients

Technical Terrence Team

4 Penny Stocks I'd Love to Buy for My Christmas Stockings!

Leave a Reply Cancel reply

Recommended.

Lisa’s family photos

Twitter preparing for payments, could include Bitcoin and cryptocurrencies: FT

Don't be intimidated, says analyst: Bitcoin and altcoins' rally is just getting started

Meet ConDistFL: A Revolutionary Federated Learning Approach for Organ and Disease Segmentation in CT Datasets

Artificial intelligence and chain chat for intelligent interaction on the Web3

Categories

Important Links

Controllable music production with diffusion models and orientation gradients

Related

Technical Terrence Team

4 Penny Stocks I'd Love to Buy for My Christmas Stockings!

Leave a Reply Cancel reply

Recommended.

Lisa’s family photos

Twitter preparing for payments, could include Bitcoin and cryptocurrencies: FT

Don't be intimidated, says analyst: Bitcoin and altcoins' rally is just getting started

Meet ConDistFL: A Revolutionary Federated Learning Approach for Organ and Disease Segmentation in CT Datasets

Artificial intelligence and chain chat for intelligent interaction on the Web3

Categories

Important Links

Get daily news updates to your inbox!