Acoustic Model Fusion for End-to-End Speech Recognition

Recent advances in deep learning and automatic speech recognition (ASR) have enabled end-to-end (E2E) ASR and increased its accuracy to a new level. E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, merging a separate LM, trained exclusively on text corpus, into the E2E system has proven to be beneficial. However, the application of LM fusion has certain drawbacks, such as its inability to address the domain mismatch problem inherent in internal AM. Inspired by the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across multiple test sets. We also found that this AM fusion approach is particularly beneficial for improving named entity recognition.

Acoustic Model Fusion for End-to-End Speech Recognition

Technical Terrence Team

These tips from Warren Buffett are key to building wealth in 2024 and beyond

Leave a Reply Cancel reply

Recommended.

Record $21.77 Billion Worth of Bitcoin Short Positions to Be Liquidated Once BTC Surpasses $70,500

To start the school year off right, invest in literacy

5 Uncommon Data Science Skills That Can Help You Land a Job

Bitcoin Roadmap to $150,000: Why a 40% Correction Is Expected

This AI paper proposes a latent diffusion model for 3D (3DLDM) that generates image data and depth maps from a given text message.

Categories

Important Links

Acoustic Model Fusion for End-to-End Speech Recognition

Related

Technical Terrence Team

These tips from Warren Buffett are key to building wealth in 2024 and beyond

Leave a Reply Cancel reply

Recommended.

Record $21.77 Billion Worth of Bitcoin Short Positions to Be Liquidated Once BTC Surpasses $70,500

To start the school year off right, invest in literacy

5 Uncommon Data Science Skills That Can Help You Land a Job

Bitcoin Roadmap to $150,000: Why a 40% Correction Is Expected

This AI paper proposes a latent diffusion model for 3D (3DLDM) that generates image data and depth maps from a given text message.

Categories

Important Links

Get daily news updates to your inbox!