Mono to stereo: how AI is breathing new life into music | by Max Hilsdorf | December 2024

Now that we've talked about the relevance of mono to stereo technology, you may be wondering how it works under the hood. It turns out that there are different approaches to tackling this problem with ai. Next, I want to show four different methods, which range From traditional signal processing to generative ai.. It does not serve as a complete list of methods, but rather as an inspiration of how this task has been solved over the past 20 years.

Traditional signal processing: formation of sound sources

Before machine learning became as popular as it is today, the field of Music Information Retrieval (MIR) it was dominated by intelligent, hand-crafted algorithms. Not surprisingly, these methods also exist for mono-to-stereo mixing.

The fundamental idea behind a 2007 article (Lagrange, Martins, Tzanetakis, (1)) It's simple:

If we can find the different sound sources in a recording and extract them from the signal, we can remix them for a realistic stereo experience.

This sounds simplebut how can we know what the sound sources of the signal are? How do we define them so clearly that an algorithm can extract them from the signal? These questions are difficult to solve, and the article uses a variety of advanced methods to achieve this. In essence, this is the algorithm they came up with:

Divide the recording into short fragments and identify peak frequencies (dominant notes) in each fragment
Identify which peaks go together (a sound source) using a clustering algorithm
Decide where each sound source must be placed in the stereo mix (manual step)
For each sound source, extract your assigned frequencies of the signal
Mix all extracted sources together to form the final stereo mix.

Example of the user interface built for the study. The user reviews all extracted sources and manually places them into the stereo mix, before resynthesizing the entire signal. Image taken from **(1)**.

Although quite complex in the details, the intuition is quite clear: Find sources, extract them, mix them again.

A quick solution: source separation/stem division

A lot has happened since Lagrange's 2007 article. Since Deezer launched its stem splitting tool Spleeter In 2019, ai-based source separation systems have become remarkably useful. Notable players such as <a target="_blank" class="af qa" href="https://www.lalal.ai/” rel=”noopener ugc nofollow” target=”_blank”>Lalal.ai either <a target="_blank" class="af qa" href="https://www.audioshake.ai/instrument-stem-separation” rel=”noopener ugc nofollow” target=”_blank”>Audioshake make a quick solution possible:

Separate a mono recording into its individual instrument stems using a free or commercial stem splitter.
Load the stems into a digital audio workstation (DAW) and mix them to your liking.

This technique was used in a 2011 research article (see (2)), but it has become much more viable since due to the Recent improvements in stem separation tools..

The disadvantage of source separation approaches is that they produce notable sound artifactsbecause source separation itself is still not without its flaws. Furthermore, these approaches still Requires manual mixing by humans, making them only semi-automatic.

To fully automate mono-to-stereo mixing, machine learning is required. By learning from real stereo mixes, the ML system can adapt the mixing style of real human producers.

Machine learning with parametric stereo

Serrà and his colleagues presented at ISMIR 2023 a very creative and efficient way to use machine learning for mono-to-stereo mixing. (3). This work is based on a musical compression technique called parametric stereo. Stereo mixes consist of two audio channels, making them difficult to integrate into low-bandwidth environments such as streaming music, radio broadcasts, or telephone connections.

Parametric stereo is a technique for creating stereo sound from a single mono signal using focusing on important spatial cues our brain uses it to determine where sounds come from. These signs are:

how noisy a sound is in the left ear versus the right ear (Interchannel Intensity Difference, IID)
How synchronized? is between left and right in terms of time or phase (time between channels or phase difference)
How similar or different? the signals are in each ear (Interchannel Correlation, IC)

Using these parameters, a stereo experience can be created from nothing more than a mono signal.

This is the approach the researchers took to develop their mono-to-stereo mixing model:

Collect a large set of data stereo music tracks
Convert stereo tracks to parametric stereo (mono + spatial parameters)
Train a neural network predict spatial parameters given a mono recording
To convert a new mono signal to stereo, use the trained model to infer spatial parameters from mono signal and combine the two into a parametric stereo experience

Currently, there does not appear to be any code or listening demos available for this document. The authors themselves confess that “there is still a gap between professional stereo mixes and the proposed approaches” (p. 6). Still, the article describes a creative and efficient way to achieve a fully automated mono-to-stereo mix using machine learning.

Generative ai: Transformer-Based Synthesis

Stereo generation in Meta's text-to-music model, MusicGen. Image taken from <a target="_blank" class="af qa" href="https://medium.com/towards-data-science/musicgen-reimagined-metas-under-the-radar-advances-in-ai-music-36c1adfd13b7″ rel=”noopener”>another article by the author.

Now, we'll get to the seemingly simplest way to generate stereo from mono. Training a generative model to take a mono input and synthesize both stereo output channels directly. Although conceptually simple, this is by far the most technically challenging approach. One second of high-resolution audio has 44.1k data points. Therefore, generating a three-minute song with stereo channels means generating more than 15 million data points.

With today's technologies such as convolutional neural networks, transformers, and neural audio codecs, the complexity of the task is starting to become manageable. There are some papers that chose to generate stereo signals through direct neural synthesis (see (4), (5), (6)). However, only (5) train a model that can solve out-of-the-box mono-to-stereo generation. My intuition is that there is room for an article that builds a project dedicated to the “simple” mono-to-stereo generation task and focuses 100% on solving this goal. Anyone here looking for a PhD topic?

Mono to stereo: how AI is breathing new life into music | by Max Hilsdorf | December 2024

Technical Terrence Team

Delta Air Lines Passenger Furious Over Strange Dog Experience

Leave a Reply Cancel reply

Recommended.

Former Disco Elysium developers are creating the game's spiritual successor in a new studio.

watchOS 11 lets you take a day off from exercising without losing your streak

Norway Seizes $6 Million in Crypto Stolen in Axie Infinity Hack

This AI article presents a unified perspective on the relationship between latent space and generative models

Meta AI launches Apollo: a new family of large multimodal video-LMM models for video understanding

Categories

Important Links

Mono to stereo: how AI is breathing new life into music | by Max Hilsdorf | December 2024

Traditional signal processing: formation of sound sources

A quick solution: source separation/stem division

Machine learning with parametric stereo

Generative ai: Transformer-Based Synthesis

Related

Technical Terrence Team

Delta Air Lines Passenger Furious Over Strange Dog Experience

Leave a Reply Cancel reply

Recommended.

Former Disco Elysium developers are creating the game's spiritual successor in a new studio.

watchOS 11 lets you take a day off from exercising without losing your streak

Norway Seizes $6 Million in Crypto Stolen in Axie Infinity Hack

This AI article presents a unified perspective on the relationship between latent space and generative models

Meta AI launches Apollo: a new family of large multimodal video-LMM models for video understanding

Categories

Important Links

Get daily news updates to your inbox!