Large multimodal language models with low-rank adaptation fusion for device-directed speech detection
Although large language models (LLMs) have shown promise for human-like conversations, they are primarily trained on text data. Incorporating audio ...