Resource-Efficient Device-Directed Speech Detection and Multimodal Data with Large Base Models

*=Equal taxpayers

This article was accepted into the Efficient Natural Language and Speech Processing workshop at NeurIPS 2023.

Interactions with virtual assistants often begin with a predefined trigger phrase followed by the user's command. To make interactions with the assistant more natural, we explored whether it is feasible to remove the requirement that users begin each command with a trigger phrase. We address this task by combining decoder signals from an automatic speech recognition (ASR) system with acoustic and lexical representations as input features to a large language model (LLM). We are interested in data- and resource-efficient systems that require only a small amount of training data and can potentially run on devices such as smartphones. For this reason, our model fits a small amount of multimodal data using low-rank adaptation. We compare the proposed system with unimodal models that rely solely on lexical or acoustic information. The effectiveness of our method is analyzed by fitting LLM only decoders with sizes between 3 billion and 13 billion parameters on training data consisting of 10 thousand to 80 thousand expressions. We show that our best multimodal system produces better results than unimodal baselines and uses only a fraction of the training data.

Resource-Efficient Device-Directed Speech Detection and Multimodal Data with Large Base Models

Technical Terrence Team

Yen rises on possible Bank of Japan change, stocks rise By Reuters

Leave a Reply Cancel reply

Recommended.

DraftKings and PGA Tour Launch NFT Golf Game

Bitcoin Whales Balance Returns to Pre-FTX Collapse Levels: Impact on BTC Price?

Ethereum Will Overtake Solana in 2025, Says Bitwise CIO

8BitDo Celebrates 11th Anniversary with Gold and Silver Metal Controllers

Bitcoin rests at $28K as US jobs data prompts fresh Fed rate hike bets

Categories

Important Links

Resource-Efficient Device-Directed Speech Detection and Multimodal Data with Large Base Models

Related

Technical Terrence Team

Yen rises on possible Bank of Japan change, stocks rise By Reuters

Leave a Reply Cancel reply

Recommended.

DraftKings and PGA Tour Launch NFT Golf Game

Bitcoin Whales Balance Returns to Pre-FTX Collapse Levels: Impact on BTC Price?

Ethereum Will Overtake Solana in 2025, Says Bitwise CIO

8BitDo Celebrates 11th Anniversary with Gold and Silver Metal Controllers

Bitcoin rests at $28K as US jobs data prompts fresh Fed rate hike bets

Categories

Important Links

Get daily news updates to your inbox!