This AI paper proposes a self-monitored music comprehension model called MERT that achieves overall SOTA performance on 14 MIR tasks.

Self-supervised learning is being used prominently in artificial intelligence to develop intelligent systems. Transformer models such as BERT and T5 have recently become popular due to their excellent properties and have used the idea of self-monitoring in natural language processing tasks. These models are first trained on massive amounts of unlabeled data and then fitted with samples of labeled data. Although self-supervised learning has been used successfully in various fields, including speech processing, computer vision, and natural language processing, its application has yet to be explored in musical audios. The reason for this is the limitations that accompany the field of music, which models musical knowledge as the tonal and tuning characteristics of music.

To address this problem, a team of researchers introduced MERT, which is an abbreviation for ‘Large-Scale Self-Monitored Training Model of Music Comprehension’. This acoustic model has been developed with the idea of using teacher models to generate pseudo-labels as masked language modeling (MLM) for the pre-training phase. MERT helps the transformer coder in the BERT approach, which is the student model, better understand and understand the model music audio by integrating the teacher models.

This affordable and generalizable pretrained acoustic music model follows a self-monitored speech learning paradigm and employs teacher models to generate pseudo-targets for sequential audio clips by incorporating a multitasking paradigm to balance musical and acoustic performance learning. To improve the robustness of the learned representations, MERT has introduced a batch noise mixing augmentation technique. By combining audio recordings with random clips, this technique distorts the audio recordings, challenging the model to grasp relevant meanings even in obscure circumstances. This addition improves the model’s ability to generalize situations where music may be mixed with irrelevant audio.

🚀 JOIN the fastest ML subreddit community

The team has created a super effective combination of teacher models that shows better performance than all conventional audio and voice methods. This group includes an acoustics teacher based on Residual Vector Quantization – Variational AutoEncoder (RVQ-VAE) and a music teacher based on Constant-Q Transform (CQT). The Acoustic Teacher uses RVQ-VAE to provide a discrete acoustic level summary of the musical signal, capturing the acoustic characteristics. Based on CQT, the music teacher focuses on capturing the pitch and tonal aspects of music. Together, these teachers guide the model student in learning meaningful musical audio performances.

The team has also explored configurations to address the pre-training instability of the acoustic language model. By optimizing these settings, they were able to scale MERT from 95M to 330M parameters, resulting in a more powerful model capable of capturing intricate details of music audio. After evaluation, the experimental results demonstrated the efficacy of MERT in the generalization of various music comprehension tasks. The model achieved SOTA scores in 14 different tasks, demonstrating its high performance and generalizability.

In conclusion, the MERT model addresses the gap in the application of self-supervised learning to musical audios.

review the Paper and github link. Don’t forget to join our 23k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.

Check out https://aitoolsclub.com to find 100’s of Cool AI Tools

This AI paper proposes a self-monitored music comprehension model called MERT that achieves overall SOTA performance on 14 MIR tasks.

Technical Terrence Team

Ethereum is security, says former Twitter CEO Jack Dorsey

Leave a Reply Cancel reply

Recommended.

Flexport turns to Shopify for cash, behind the wheel of the Kia EV9 and where Amazon wants to invest

Crypto Lawyer Brings Ethereum Investors Together In Class Action Against NYAG “Securities” Label

A foundation for playable worlds (explained)

Japan's Metaplanet Bitcoin Holdings Rise to 530 BTC After Doing This

Weekly jobless claims climb to 198,000 as employers remain reluctant to downsize

Categories

Important Links

This AI paper proposes a self-monitored music comprehension model called MERT that achieves overall SOTA performance on 14 MIR tasks.

Related

Technical Terrence Team

Ethereum is security, says former Twitter CEO Jack Dorsey

Leave a Reply Cancel reply

Recommended.

Flexport turns to Shopify for cash, behind the wheel of the Kia EV9 and where Amazon wants to invest

Crypto Lawyer Brings Ethereum Investors Together In Class Action Against NYAG “Securities” Label

A foundation for playable worlds (explained)

Japan's Metaplanet Bitcoin Holdings Rise to 530 BTC After Doing This

Weekly jobless claims climb to 198,000 as employers remain reluctant to downsize

Categories

Important Links

Get daily news updates to your inbox!