Large multimodal language models with low-rank adaptation fusion for device-directed speech detection

Although large language models (LLMs) have shown promise for human-like conversations, they are primarily trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a fusion low-rank adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume novel, never-before-seen modalities via low-rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves a relative 22% reduction in equivalent error rate (EER) compared to the text-only approach and reaches performance parity with its full fine-tuning (FFT) counterpart while needing to tune only a fraction of its parameters. Furthermore, with the newly introduced adapter loss, FLoRA is robust to missing data, improving over FFT by 20% lower EER and 56% lower false acceptance rate. The proposed approach scales well for model sizes from 16M to 3B parameters.

Large multimodal language models with low-rank adaptation fusion for device-directed speech detection

Technical Terrence Team

Debit balance of the operating account: overview and management

Leave a Reply Cancel reply

Recommended.

Evaluation of IWSLT2023 speech translation tasks: human annotations, automatic metrics and segmentation

XRP Rallies on Tuesday as AVAX Hits 1-Week High – Market Updates Bitcoin News

The Guide to Validating Signatures in PDF Documents

Former Ethereum miner CoreWeave raises $221 million in Series B

The best Presidents' Day lessons and activities

Categories

Important Links

Large multimodal language models with low-rank adaptation fusion for device-directed speech detection

Related

Technical Terrence Team

Debit balance of the operating account: overview and management

Leave a Reply Cancel reply

Recommended.

Evaluation of IWSLT2023 speech translation tasks: human annotations, automatic metrics and segmentation

XRP Rallies on Tuesday as AVAX Hits 1-Week High – Market Updates Bitcoin News

The Guide to Validating Signatures in PDF Documents

Former Ethereum miner CoreWeave raises $221 million in Series B

The best Presidents' Day lessons and activities

Categories

Important Links

Get daily news updates to your inbox!