Neural audio codecs have completely changed the way audio is compressed and handled by converting continuous audio signals into discrete tokens. This technique uses generative models trained on discrete tokens to produce complex audio while maintaining excellent audio quality. These neural codecs have significantly improved audio compression, allowing audio data to be stored and transferred more effectively without compromising sound quality.
However, many of the neural audio codec models currently in use were not designed to distinguish between distinct sound domains, but were instead trained on large and varied audio data sets. For example, the harmonics and structure of spoken language are very different from those of music or environmental noise. The inability to distinguish between different audio domains makes it difficult to effectively model the data and manage sound production. These models struggle to handle the distinctive qualities of different audio formats, which can result in sub-optimal performance, particularly in applications that require precise control of sound production.
To overcome these issues, a team of researchers has introduced the Source-Disentangled Neural Audio Codec (SD-Codec), a unique technique that combines source separation and audio coding. SD-Codec aims to improve upon current neural codecs by specifically identifying and classifying audio signals into distinct domains. Unlike other latent space compression techniques, SD-Codec assigns discrete representations, or distinct codebooks, to various audio sources, including music, sound effects, and speech. Because of this division, the model is better able to recognize and maintain the distinctive qualities of each audio form.
SD-Codec improves the interpretability of the latent space in neural audio codecs by simultaneously learning to separate and resynthesize audio. In addition to helping preserve high-quality audio resynthesis, it provides additional control over the audio creation process by making it easier to distinguish between multiple sources. Because SD-Codec can separate sources within the latent space, it can manipulate the audio output more precisely, which is very useful for applications that need to generate or edit detailed audio.
According to experimental results, SD-Codec successfully disentangles multiple audio sources and performs at a competitive level in terms of audio resynthesis quality. This separation capability translates into improved interpretability, making it easier to understand and manipulate the generated audio.
The team has summarized its main contributions as follows.
- SD-Codec, a neural audio codec, has been proposed that extracts various audio sources such as speech, music, and sound effects from input audio clips and reconstructs high-quality audio. This dual feature increases the adaptability and utility of the codec for a variety of audio processing applications.
- It has been studied how the SD-Codec could make use of shared residual vector quantization (RVQ). The results have shown that the performance does not change if a common codebook is used or not. This highlights the hierarchical processing of the audio input within the codec and implies that shallow RVQ levels are responsible for storing semantic information, while deeper layers concentrate on capturing local acoustic features.
- A large-scale dataset has been used to train the SD-Codec and the results have shown that it performs well in source separation and audio reconstruction. This extensive training ensures that the model is reliable and functional in various acoustic situations.
In conclusion, SD-Codec is a major advancement in neural audio codecs, providing a more advanced and manageable method of audio production and compression.
Take a look at the PaperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
FREE ai WEBINAR: 'SAM 2 for Video: How to Optimize Your Data' (Wednesday, September 25, 4:00 am – 4:45 am EST)
Tanya Malhotra is a final year student of the University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking skills, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>